First, we initialize all the libraries which are necessary.

Some of the classes imported are the following:
<ul>
    <li><b>FullyObservableEnvironment:</b></li>
        &emsp; This class contains the type of environment where you can perceive all places where there are Golds and Traps. All other relevant portions of the environment are also visible.
    <li><b>PartiallyObservableEnvironment:</b></li>
        &emsp; In this type of environment some states are hidden, that means, the agent(s) can never see the entire state of the environment. This kind of environment needs agents with memory to be solved.
    <li><b>ReflexAgent:</b></li>
        &emsp; This class implements the Simple Reflex Agents which acts only on basis of the percepts that the agents receives from the environment. It's actions are based on condition-action rules.
    <li><b>ModelBasedAgent:</b></li>
        &emsp; This is the kind of agents which maintains the structure that describes the part of the world which cannot see. This knowledge is what is called model of the world.
</ul>

In [1]:
import numpy as np
import random

from FullyObservableEnvironment import FullyObservableEnvironment
from PartiallyObservableEnvironment import PartiallyObservableEnvironment
from ReflexAgent import ReflexAgent
from ModelBasedAgent import ModelBasedAgent
from Objects import *

<h1><b>Partially Observable Environment</b></h1>

The first Agent to be tested is the Reflex Agent in a Partially Observable Environment.
In addition to the agent, we also add 5 pieces of gold and 6 traps in specified positions.

In [2]:
environment = PartiallyObservableEnvironment()

reflex_agent = ReflexAgent()
environment.add_thing(reflex_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (U - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Percept
(- - -) (- 1 1) (- - -) 
(- - -) (U - -) (- - -) 
(- - -) (- - -) (- - 1) 

Agent state: (3, 3, UP)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  ADVANCE
Agent state:  (2, 3, UP)
Agent performance: 104

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (U - -) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Percept
(- - -) (- - -) (- 2 -

The second agent in the Partially Observable Environment is the Model Based Agent which will be tested with gold and traps at the same positions as the previous example.

In [3]:
environment = PartiallyObservableEnvironment()

model_agent = ModelBasedAgent()
environment.add_thing(model_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

environment.run()

 - -) (- - -) (- - -) 
(- - -) (U - -) (- 2 -) 
(- - -) (- 1 1) (- - -) 

Agent internal state
(? ? ?) (? ? ?) (- - -) (- - -) (- - -) 
(? ? ?) (? ? ?) (- - -) (V - -) (- 2 -) 
(? ? ?) (? ? ?) (- - -) (- 1 1) (- - -) 
(? ? ?) (? ? ?) (? ? ?) (? ? ?) (? ? ?) 
(? ? ?) (? ? ?) (? ? ?) (? ? ?) (? ? ?) 

Agent state: (1, 3, UP)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  TURN
Agent state:  (1, 3, RIGHT)
Agent performance: 99

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (R - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Percept
(- - -) (- - -) (- - -) 
(- - -) (R - -) (- 2 -) 
(- - -) (- 1 1) (- - -) 

Agent internal state
(? ? ?) (? ? ?) (- - -) (- - -) (- - -) 
(? ? ?) (? ? ?) (- - -) (V - -) (- 2 -)

At the end of the implementation of the agents in the Partially Observable Environment we see the results of the <u>Reflex Agent's</u> performance:

In [4]:
reflex_agent.performance

99

... and the <u>Model-Based Agent</u>:

In [5]:
model_agent.performance

99

<h1><b>Fully Observable Environment</b></h1>

In this second part of the homework we use the Fully Observable Environment, first with the Reflex Agent inside it, as well as the past exercise, we use gold and traps in explicit positions.

In [6]:
environment = FullyObservableEnvironment()

reflex_agent = ReflexAgent()
environment.add_thing(reflex_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (U - -) (- - -) (- - -) (- - 1) 

Agent state: (4, 1, UP)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  TURN
Agent state:  (4, 1, RIGHT)
Agent performance: 99

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (R - -) (- - -) (- - -) (- - 1) 

<STEP 2>
SELECTED ACTION:  TURN
Agent state:  (4, 1, DOWN)
Agent performance: 98

Environment: 
     0       1     

And the Model Based Agent in the Fully Observable Environment.

In [7]:
environment = FullyObservableEnvironment()

model_agent = ModelBasedAgent()
environment.add_thing(model_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (U - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Agent internal state
(- - -) (- 1 -) (- - -) (- - -) (- - -) 
(- - 1) (- - -) (- - -) (- - -) (- 2 -) 
(- - -) (- - -) (- - -) (- 1 1) (- - -) 
(- - -) (- - 3) (- - -) (- - -) (V - -) 
(- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Agent state: (3, 4, UP)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  TURN
Agent state:  (3, 4, RIGHT)
Agent performance: 99

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (-

We can also see the performance of the <u>Reflex Agent</u>:

In [8]:
reflex_agent.performance

117

... and the <u>Model Based Agent</u> in the Fully Observable Environment:

In [9]:
model_agent.performance

112

<h1>Conclusions</h1>

<ul>
    <li><b>Which agent behaves better in the Partially Observable Environment?:</b></li>
        &emsp; During the tests that were carried out, we obtained better results in the vast majority of them using the Model-Based Agent which makes sense since it may not receive the full state of the environment and may not be able to see the gold pieces it is looking for, but it keep in his model some of the gold pieces already seen by it's percepts. In the case of the Reflex Agent, when it doesn't perceive any piece of gold it must explore the world which may lead to falling into traps.
    <li><b>Which agent behaves better in the Fully Observable Environment?:</b></li>
        &emsp; Using this kind of environment, both Agents had similar results because they didn't have to look for pieces of gold, their perceives always had the exact position of each gold in the Environment.
    <li><b>Are the Agents behaving rationally?:</b></li>
        &emsp; Yes, in some way. Whenever they are in the same column or row as some piece of gold, they try to go for it, if not they try to explore. But sometimes they don't try not to fall into traps.
    <li><b>What is better to pick all the gold in the environment? Less or more steps?:</b></li>
        &emsp; It depends of the number of pieces of gold in the environment, if there is a small number of gold then is better to set a small number of steps for an agent to perform because if we take all the gold ang the agent doesn't stop, it would continue to loss performance. But fortunately, the agents in this exercise do stop when there are no more gold left.
    <li><b>Was it fair to test with gold pieces and traps in fixed positions? Why not in random positions?:</b></li>
        &emsp; That would not have been fair because one agent may have had a more difficult layout than other.
</ul>

