# The Agent-Environment Interaction

In this exercise, you will implement the interaction of a reinforecment learning agent with its environment. We will use the gridworld environment from the second lecture. You will find a description of the environment below, along with two pieces of relevant material from the lectures: the agent-environment interface and the Q-learning algorithm.

1. Create an agent that chooses actions randomly with this environment. 

2. Create an agent that uses Q-learning. You can use initial Q values of 0, a stochasticity parameter for the $\epsilon$-greedy policy function $\epsilon=0.05$, and a learning rate $\alpha = 0.1$. But feel free to experiment with other settings of these three parameters.

3. Plot the mean total reward (i.e. the undiscounted return) obtained by the two agents for each episode. This kind of graph is called a **learning curve**, and it gives us an idea of how our agent's performance changes during training.


## The agent-environment interface

<img src="img/agent-environment.png" style="width: 500px;" align="left"/> 

<br><br><br>

The interaction of the agent with its environments starts at decision stage $t=0$ with the observation of the current state $s_0$. (Notice that there is no reward at this initial stage.) The agent then chooses an action to execute at decision stage $t=1$. The environment responds by changing its state to $s_1$ and returning the numerical reward signal $r_1$. 


## The environment: Navigation in a gridworld

<img src="img/gold.png" style="width: 250px;" align="left"/>

The agent has four possible actions in each state (grid square): west, north, south, and east. The actions are unreliable. They move the agent in the intended direction with probability 0.8, and with probability 0.2, they move the agent in a random other direction. If the direction of movement is blocked, the agent remains in the same grid square. The initial state of the agent is one of the five grid squares at the bottom, selected randomly. The grid squares with the gold and the bomb are **terminal states**. If the agent finds itself in one of these squares, the episode ends. Then a new episode begins with the agent at a randomly selected initial state.

You will use a reinforcement learning algorithm to compute the best policy for finding the gold with as few steps as possible while avoiding the bomb. For this, we will use the following reward function: $-1$ for each navigation action, an additional $+10$ for finding the gold, and an additional $-10$ for hitting the bomb. For example, the immediate reward for transitioning into the square with the gold is $-1 + 10 = +9$. Do not use discounting (that is, set $\gamma=1$).

## Q-Learning
For your reference, the pseudocode for the Q-Learning algorithm is reproduced below (Reinforcement Learning, Sutton & Barto, 2018, Section 6.5 p.131).
<img src="img/q.png" style="width: 720px;"/>


## Example of a learning curve

<img src="img/lc_example.png" style="width: 550px;" align="left"/>

<br><br><br><br>

This is a sample learning curve and shows the reward obtained by a Q-learning agent across 500 episodes. Do not try to replicate this exact curve! It was computed using a different environment than the one described here.

In [1]:
# environment
import numpy as np

class Gridworld:
    def __init__(self):
        self.num_rows = 5
        self.num_cols = 5
        self.num_cells = self.num_cols * self.num_rows
        self.random_move_probability = 0.2
        
        # Choose starting position of the agent randomly among the first 5 cells
        self.agent_position = np.random.randint(0, 5)
        
        # Choose position of the gold and bomb
        self.bomb_positions = np.array([18])
        self.gold_positions = np.array([23])
        self.terminal_states = np.array([self.bomb_positions, self.gold_positions])
       
        # Specify rewards
        self.rewards = np.zeros(self.num_cells)
        self.rewards[self.bomb_positions] = -10
        self.rewards[self.gold_positions] = 10
        
        # Specify available actions
        self.actions = ["UP", "RIGHT", "DOWN", "LEFT"]
        self.num_actions = len(self.actions)
        
    def get_available_actions(self):
        return self.actions
    
    def get_start_location(self):
        return self.agent_position
    
    def make_step(self, action_index): 
        # Randomly sample action_index if world is stochastic
        if np.random.uniform(0, 1) < self.random_move_probability:
            action_indices = np.arange(self.num_actions, dtype=int)
            action_indices = np.delete(action_indices, action_index)
            action_index = np.random.choice(action_indices, 1)[0]

        action = self.actions[action_index]

        # Determine new position and check whether the agent hits a wall.
        old_position = self.agent_position
        new_position = self.agent_position
        if action == "DOWN":
            candidate_position = old_position + self.num_cols
            if candidate_position < self.num_cells:
                new_position = candidate_position
        elif action == "RIGHT":
            candidate_position = old_position + 1
            if candidate_position < self.num_cells:
                new_position = candidate_position
        elif action == "UP":
            candidate_position = old_position - self.num_cols
            if candidate_position >= 0:
                new_position = candidate_position
        elif action == "LEFT":  # "LEFT"
            candidate_position = old_position - 1
            if candidate_position >= 0:
                new_position = candidate_position
        else:
            raise ValueError('Action was mis-specified!')

        # Update the environment state
        self.agent_position = new_position
        
        # Calculate reward
        reward = self.rewards[self.agent_position]
        reward -= 1
        return reward, new_position
    
    def reset(self):
        self.agent_position = np.random.randint(0, 5)
        return self.agent_position
        

In [2]:
class RandomAgent():
    def choose_action(self, available_actions):
        number_of_actions = len(available_actions)
        random_action_index = np.random.randint(0, number_of_actions)
        return random_action_index

In [3]:
class QLearningAgent():
    def __init__(self, start_location):
        # table in format: Position, Action, Q-Value
        #self.q_table = np.zeros((1,2))
        self.q_table = {}
        self.position = start_location
    
    def choose_action(self, available_actions):
        # check if current action/state in q-table
        for i in range(4):
            if  not (self.position, i) in self.q_table:
                self.q_table[self.position, i] = 0
        
        # decide if random or greedy
        # random action 5% time, greedy 95%
        if np.random.uniform(0, 1) > 0.95:
            print("Random action chosen.")
            # random action
            number_of_actions = len(available_actions)
            return np.random.randint(0, number_of_actions)
        else:
            # greedy action
            print("Greedy action chosen.")
            best_action = 0
            for i in range(4):
                if self.q_table[self.position, i] > self.q_table[self.position, best_action]:
                    best_action = i
            return best_action
        
    def update_table(self, reward, new_position, action):
        # find q value and update table with it
        learning_rate = 0.1
        # find new estimate
        newEstimate = 0
        for i in range(4):
            if  not (new_position, i) in self.q_table:
                self.q_table[new_position, i] = 0
            else:
                if self.q_table[new_position, i] > newEstimate:
                    newEstimate = self.q_table[new_position, i]
        q_value = round(((1-learning_rate) * self.q_table[self.position, action]) + learning_rate*(reward + newEstimate), 2)
        self.q_table[self.position, action] = q_value
        # update position to new position
        self.position = new_position

    def print_table(self):
        print(self.q_table)
        
    def update_position(self, new_position):
        self.position = new_position

In [4]:
env = Gridworld()
randomAgent = RandomAgent()
Qagent = QLearningAgent(env.get_start_location())

In [5]:
available_actions = env.get_available_actions()
print("Available_actions =", available_actions)
chosen_action = Qagent.choose_action(available_actions)
print("Chosen action =", chosen_action)
reward, position = env.make_step(chosen_action)
Qagent.update_table(reward, position, chosen_action)

Available_actions = ['UP', 'RIGHT', 'DOWN', 'LEFT']
Greedy action chosen.
Chosen action = 0


In [None]:
episodeNumber = 500
yValues = []
xValues = range(episodeNumber)
for i in range(episodeNumber):
    print("Episode: ", i)
    totalReward = 0
    reward = 0
    while reward != 9 and reward != -11:
        print("Current position of the agent =", env.agent_position)
        available_actions = env.get_available_actions()
        chosen_action = Qagent.choose_action(available_actions)
        print("Chosen action =", available_actions[chosen_action])
        reward, position = env.make_step(chosen_action)
        print("Reward obtained =", reward)
        Qagent.update_table(reward, position, chosen_action)
        print("Current position of the agent =", env.agent_position)
        totalReward+= reward
    starting_location = env.reset()
    Qagent.update_position(starting_location)
    yValues.append(totalReward)

Episode:  0
Current position of the agent = 1
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chos

Current position of the agent = 9
Random action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 15
Current position of the agent = 15
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 20
Current position of the agent = 20
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 15
Current position of the agent = 15
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 10
Current position of the agent = 10
Greedy action chosen.
C

Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 10
Current position of the agent = 10
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 11
Current position of the agent = 11
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 6
Current position of the 

Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 17
Current position of the agent = 17
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 12
Current position of the agent = 12
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 11
Current position of the agent = 11
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 12
Current position of the agent = 12
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 7
Current position of the agent = 7
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current posi

Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 11
Current position of the agent = 11
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of

Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 15
Current position of the agent = 15
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 10
Current position of the agent = 10
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 11
Current position of the agent = 11
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of th

Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent 

Reward obtained = 9.0
Current position of the agent = 23
Episode:  15
Current position of the agent = 3
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Random action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agen

Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 11
Current position of the agent = 11
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 16
Current position of the agent = 16
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 21
Current position of the agent = 21
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 16
Current position of the agent = 16
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 21
Current position of the agent = 21
Random action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 22
Current position of the agent = 22
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  19
Current position of the agent = 0
Greedy action chosen.
Chosen action = LEFT
Reward obtaine

Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 16
Current position of the agent = 16
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 11
Current position of the agent = 11
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position 

Current position of the agent = 12
Current position of the agent = 12
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 11
Current position of the agent = 11
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 12
Current position of the agent = 12
Random action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 17
Current position of the agent = 17
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 12
Current position of the agent = 12
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 7
Current position of the agent = 7
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Random action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the 

Reward obtained = -1.0
Current position of the agent = 21
Current position of the agent = 21
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 22
Current position of the agent = 22
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  32
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agen

Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 22
Current position of the agent = 22
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  35
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 7
Current position of the agent = 7
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 7
Current position of the agent = 7
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 12
Current position of the agent = 12
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 13
Current position of the agent = 13
Greedy action chosen.
Chosen action = LEFT

Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2


Reward obtained = -1.0
Current position of the agent = 10
Current position of the agent = 10
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 7
Current position of the agent = 7
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current

Current position of the agent = 0
Random action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 10
Current position of the agent = 10
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 15
Current position of the agent = 15
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 16
Current position of the agent = 16
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 21
Current position of the agent = 21
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 22
Current position of the agent = 22
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  44
Current position of the agent = 2
Gree

Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current posi

Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1


Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of

Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current posi

Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Random action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy

Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of

Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy a

Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Random action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of

Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  57
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current positio

Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  58
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current positio

Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy a

Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of

Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Random action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current po

Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of

Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
R

Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of

Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
R

Current position of the agent = 3
Random action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 7
Current position of the agent = 7
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = U

Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Random action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position

Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Random action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of

Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current posi

Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Random action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP

Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of

Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Random action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy

Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of

Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 10
Current position of the agent = 10
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Random action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 10
Current position of the agent = 10
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14


Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of

Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Random action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Random action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position

Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
R

Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
R

Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
R

Current position of the agent = 7
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = UP
R

Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Random action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy a

Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0


Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
R

Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current posi

Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0


Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 7
Current position of the agent = 7
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0


Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
R

Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Random action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Random action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen actio

Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of

Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of

Reward obtained = -1.0
Current position of the agent = 12
Current position of the agent = 12
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 13
Current position of the agent = 13
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 15
Current position of the agent = 15
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the 

Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  169
Current position of the agent = 4
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Rewar

Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Random action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 1
Current position 

Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Random action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position

Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of

Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 7
Current pos

Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  201
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of th

Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 15
Current position of the agent = 15
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the age

Reward obtained = -1.0
Current position of the agent = 6
Current position of the agent = 6
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current pos

Reward obtained = -1.0
Current position of the agent = 10
Current position of the agent = 10
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 15
Current position of the agent = 15
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  227
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of

Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 15
Current position of the agent = 15
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 19
Current position

Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 0
Current position of the agent = 0
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of t

Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  264
Current position of the agent = 3
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of

Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Random action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 2
Current position of the agent = 2
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 7
Current position of the agent = 7
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the

Reward obtained = -1.0
Current position of the agent = 3
Current position of the agent = 3
Random action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
C

Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  299
Current position of the agent = 4
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  300
Current position of the agent = 0
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 5
Current position of the agent = 5
Greedy action chosen.
Chosen action = LEFT
Reward 

Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  316
Current position of the agent = 3
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  317
Current position of the agent = 0
Greedy action chosen.
Chose

Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Random action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  335
Current position of the agent = 3
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1

Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current posi

Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  364
Current position of the agent = 3
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Random action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 13
Current position of the agent = 13
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained =

Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  379
Current position of the agent = 3
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Curr

Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 7
Current position of the agent = 7
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 4
Current position of the agent = 4
Random action chosen.
Chosen action = RIGHT
Reward obtain

Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 9
Current position of the agent = 9
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 14
Current position of the agent = 14
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Random action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  409
Current position of the agent = 3
Random action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 8
Current position of the agent = 8
Greedy action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position 

Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Random action chosen.
Chosen action = RIGHT
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Random action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 19
Current position of the agent = 19
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = -1.0
Current position of the agent = 24
Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the a

Current position of the agent = 24
Greedy action chosen.
Chosen action = LEFT
Reward obtained = 9.0
Current position of the agent = 23
Episode:  440
Current position of the agent = 2
Greedy action chosen.
Chosen action = DOWN
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
Chosen action = UP
Reward obtained = -1.0
Current position of the agent = 1
Current position of the agent = 1
Greedy action chosen.
C

In [None]:
import matplotlib.pyplot as plt
plt.plot(xValues, yValues)
plt.xlabel('Episodes')
plt.ylabel('Reward')
plt.title("Learning Curve of a Q-Learning Policy Agent")
plt.show()


In [None]:
episodeNumber = 500
yValues = []
xValues = range(episodeNumber)
for i in range(episodeNumber):
    print("Episode: ", i)
    totalReward = 0
    reward = 0
    while reward != 9 and reward != -11:
        print("Current position of the agent =", env.agent_position)
        available_actions = env.get_available_actions()
        chosen_action = randomAgent.choose_action(available_actions)
        print("Chosen action =", available_actions[chosen_action])
        reward, position = env.make_step(chosen_action)
        print("Reward obtained =", reward)
        print("Current position of the agent =", env.agent_position)
        totalReward+= reward
    starting_location = env.reset()
    yValues.append(totalReward)

In [None]:
import matplotlib.pyplot as plt
plt.plot(xValues, yValues)
plt.xlabel('Episodes')
plt.ylabel('Reward')
plt.title("Learning Curve of a Random Policy Agent")
plt.show()