First, we initialize all the libraries which are necessary.

Some of the classes imported are the following:
<ul>
    <li><b>FullyObservableEnvironment:</b></li>
        &emsp; This class contains the type of environment where you can perceive all places where there are Golds and Traps. All other relevant portions of the environment are also visible.
    <li><b>PartiallyObservableEnvironment:</b></li>
        &emsp; In this type of environment some states are hidden, that means, the agent(s) can never see the entire state of the environment. This kind of environment needs agents with memory to be solved.
    <li><b>ReflexAgent:</b></li>
        &emsp; This class implements the Simple Reflex Agents which acts only on basis of the percepts that the agents receives from the environment. It's actions are based on condition-action rules.
    <li><b>ModelBasedAgent:</b></li>
        &emsp; This is the kind of agents which maintains the structure that describes the part of the world which cannot see. This knowledge is what is called model of the world.
</ul>

In [1]:
import numpy as np
import random

from FullyObservableEnvironment import FullyObservableEnvironment
from PartiallyObservableEnvironment import PartiallyObservableEnvironment
from ReflexAgent import ReflexAgent
from ModelBasedAgent import ModelBasedAgent
from Objects import *

<h1><b>Partially Observable Environment</b></h1>

The first Agent to be tested is the Reflex Agent in a Partially Observable Environment.
In addition to the agent, we also add 5 pieces of gold and 6 traps in specified positions.

In [2]:
environment = PartiallyObservableEnvironment()

reflex_agent = ReflexAgent()
environment.add_thing(reflex_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (R - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Percept
     1       2       3    
0 (- 1 -) (- - -) (- - -) 
1 (- - -) (R - -) (- - -) 
2 (- - -) (- - -) (- 1 1) 

Agent state: (1, 2, RIGHT)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  TURN
Agent state:  (1, 2, DOWN)
Agent performance: 99

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (D - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1

The second agent in the Partially Observable Environment is the Model Based Agent which will be tested with gold and traps at the same positions as the previous example.

In [3]:
environment = PartiallyObservableEnvironment()

model_agent = ModelBasedAgent()
environment.add_thing(model_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

environment.run()

 (? ? ?) (? ? ?) (- - -) (- - -) 
1 (? ? ?) (? ? ?) (? ? ?) (- - -) (V - -) 
2 (? ? ?) (? ? ?) (? ? ?) (- 1 1) (V - -) 
3 (? ? ?) (? ? ?) (? ? ?) (- - -) (- - -) 
4 (? ? ?) (? ? ?) (? ? ?) (? ? ?) (? ? ?) 

<STEP 8>
SELECTED ACTION:  ADVANCE
Agent state:  (2, 4, DOWN)
Agent performance: 111

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- - -) 

2 (- - -) (- - -) (- - -) (- 1 1) (D - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Percept:
     3       4    
1 (- - -) (- - -) 
2 (- 1 1) (D - -) 
3 (- - -) (- - -) 

Agent internal state:
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (? ? ?) (? ? ?) (? ? ?) (- - -) (- - -) 
1 (? ? ?) (? ? ?) (? ? ?) (- - -) (V - -) 
2 (? ? ?) (? ? ?) (? ? ?) (- 1 1) (V - -) 
3 (? ? ?) (? ? ?) (? ? ?) (- - -) (- - -) 
4 (? ? ?) (? ? ?) (? ? ?) (? ? 

At the end of the implementation of the agents in the Partially Observable Environment we see the results of the <u>Reflex Agent's</u> performance:

In [4]:
reflex_agent.performance

93

... and the <u>Model-Based Agent</u>:

In [5]:
model_agent.performance

96

<h1><b>Fully Observable Environment</b></h1>

In this second part of the homework we use the Fully Observable Environment, first with the Reflex Agent inside it, as well as the past exercise, we use gold and traps in explicit positions.

In [6]:
environment = FullyObservableEnvironment()

reflex_agent = ReflexAgent()
environment.add_thing(reflex_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (R - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Agent state: (1, 0, RIGHT)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  TURN
Agent state:  (1, 0, DOWN)
Agent performance: 94

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (D - -) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

<STEP 2>
SELECTED ACTION:  TURN
Agent state:  (1, 0, LEFT)
Agent performance: 93

Environment: 
     0       1   

And the Model Based Agent in the Fully Observable Environment.

In [7]:
environment = FullyObservableEnvironment()

model_agent = ModelBasedAgent()
environment.add_thing(model_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (R - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Agent internal state:
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 
1 (- - 1) (V - -) (- - -) (- - -) (- 2 -) 
2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 
3 (- - -) (- - 3) (- - -) (- - -) (- - -) 
4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Agent state: (1, 1, RIGHT)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  TURN
Agent state:  (1, 1, DOWN)
Agent performance: 99

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1

We can also see the performance of the <u>Reflex Agent</u>:

In [8]:
reflex_agent.performance

106

... and the <u>Model Based Agent</u> in the Fully Observable Environment:

In [9]:
model_agent.performance

114

<h1>Additional tests:</h1>
In addition to the tests performed previously, we can run several times the <u>Reflex Agent</u> in the <u>Partially Observable Environment with gold and traps placed at random positions...</u>

In [12]:
numberOfTests = 5
totalFitness_PartiallyObservableReflex = 0
fitness_PartiallyObservableReflex = []
for _ in range(numberOfTests):
    environment = PartiallyObservableEnvironment()

    reflex_agent = ReflexAgent()
    environment.add_thing(reflex_agent)

    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)

    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)

    environment.run()

    totalFitness_PartiallyObservableReflex += reflex_agent.performance
    fitness_PartiallyObservableReflex.append(reflex_agent.performance)

 (- - -) (- - 1) (- 1 1) (- - -) 

4 (- - -) (- - -) (- - -) (- - -) (- - -) 

Percept
     1       2       3    
1 (- - -) (- - -) (- - -) 
2 (- - -) (R - -) (- - -) 
3 (- - -) (- - 1) (- 1 1) 

<STEP 14>
SELECTED ACTION:  ADVANCE
Agent state:  (2, 3, RIGHT)
Agent performance: 116

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- 1 1) (- - 1) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- - 1) 

2 (- - -) (- - -) (- - -) (R - -) (- - -) 

3 (- - -) (- - -) (- - 1) (- 1 1) (- - -) 

4 (- - -) (- - -) (- - -) (- - -) (- - -) 

Percept
     2       3       4    
1 (- - -) (- - -) (- - 1) 
2 (- - -) (R - -) (- - -) 
3 (- - 1) (- 1 1) (- - -) 

<STEP 15>
SELECTED ACTION:  TURN
Agent state:  (2, 3, DOWN)
Agent performance: 115

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- 1 1) (- - 1) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- - 1) 

2 (- - -) (- - -)

... as well as the <u>Model-Based Agent</u> in the same kind of environment

In [15]:
numberOfTests = 5
totalFitness_PartiallyObservableModel = 0
fitness_PartiallyObservableModel = []
for _ in range(numberOfTests):
    environment = PartiallyObservableEnvironment()

    model_agent = ModelBasedAgent()
    environment.add_thing(model_agent)

    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)

    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)

    environment.run()

    totalFitness_PartiallyObservableModel += model_agent.performance
    fitness_PartiallyObservableModel.append(model_agent.performance)

 2) (- - -) 
1 (- - -) (- - -) (V - -) (V - -) (- - 1) 
2 (- - -) (V - -) (V - -) (V - -) (- - -) 
3 (- - -) (V - -) (- 1 -) (V - -) (- - -) 
4 (- - -) (V - -) (V - -) (V - -) (- - -) 

<STEP 19>
SELECTED ACTION:  TURN
Agent state:  (3, 3, LEFT)
Agent performance: 116

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - 1) (- - 1) (- - -) (- - 2) (- - -) 

1 (- - -) (- - -) (- - -) (- - -) (- - 1) 

2 (- - -) (- - -) (- - -) (- - -) (- - -) 

3 (- - -) (- - -) (- 1 -) (L - -) (- - -) 

4 (- - -) (- - -) (- - -) (- - -) (- - -) 

Percept:
     2       3       4    
2 (- - -) (- - -) (- - -) 
3 (- 1 -) (L - -) (- - -) 
4 (- - -) (- - -) (- - -) 

Agent internal state:
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (? ? ?) (- - 1) (- - -) (- - 2) (- - -) 
1 (- - -) (- - -) (V - -) (V - -) (- - 1) 
2 (- - -) (V - -) (V - -) (V - -) (- - -) 
3 (- - -) (V - -) (- 1 -) (V - -) (- - -) 
4 (- - -) (V - -) (V -

... to finally se their average performance. The maximum performance obtained by the <u>Reflex Agent</u> in the <u>Partially Observable Environment</u> was:

In [14]:
np.max(fitness_PartiallyObservableReflex)

128

... and for the <u>Model Based Agent</u>, its maximum performance was:

In [16]:
np.max(fitness_PartiallyObservableModel)

125

In average, the <u>Reflex Agent</u> had a performance of:

In [17]:
totalFitness_PartiallyObservableReflex/numberOfTests

113.4

... and the <u>Model-Based Agent</u> had a performance of:

In [18]:
totalFitness_PartiallyObservableModel/numberOfTests

116.2

In [19]:
numberOfTests = 5
totalFitness_FullyObservableReflex = 0
fitness_FullyObservableReflex = []
for _ in range(numberOfTests):
    environment = FullyObservableEnvironment()

    reflex_agent = ReflexAgent()
    environment.add_thing(reflex_agent)

    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)

    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)

    environment.run()

    totalFitness_FullyObservableReflex += reflex_agent.performance
    fitness_FullyObservableReflex.append(reflex_agent.performance)

(A G T)
0 (- - -) (- - -) (- - -) (- 1 -) (- - 1) 

1 (- - 1) (- - -) (- - -) (- - -) (- 1 -) 

2 (- - -) (- - -) (- 1 1) (- - -) (- - -) 

3 (- - -) (- - -) (L - -) (- - -) (- - -) 

4 (- - 1) (- - -) (- 1 1) (- - 1) (- - -) 

<STEP 4>
SELECTED ACTION:  TURN
Agent state:  (3, 2, UP)
Agent performance: 106

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- - -) (- - -) (- 1 -) (- - 1) 

1 (- - 1) (- - -) (- - -) (- - -) (- 1 -) 

2 (- - -) (- - -) (- 1 1) (- - -) (- - -) 

3 (- - -) (- - -) (U - -) (- - -) (- - -) 

4 (- - 1) (- - -) (- 1 1) (- - 1) (- - -) 

<STEP 5>
SELECTED ACTION:  ADVANCE
Agent state:  (2, 2, UP)
Agent performance: 110

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- - -) (- - -) (- 1 -) (- - 1) 

1 (- - 1) (- - -) (- - -) (- - -) (- 1 -) 

2 (- - -) (- - -) (U - -) (- - -) (- - -) 

3 (- - -) (- - -) (- - -) (- - -) (- - -) 

4 (- - 1) (- - -) (- 

In [20]:
numberOfTests = 5
totalFitness_FullyObservableModel = 0
fitness_FullyObservableModel = []
for _ in range(numberOfTests):
    environment = FullyObservableEnvironment()

    model_agent = ModelBasedAgent()
    environment.add_thing(model_agent)

    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)
    gold = Gold()
    environment.add_thing(gold)

    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)
    trap = Trap()
    environment.add_thing(trap)

    environment.run()

    totalFitness_FullyObservableModel += model_agent.performance
    fitness_FullyObservableModel.append(model_agent.performance)

(- - -) (- - -) 

1 (- 1 -) (- - -) (- - -) (- - -) (- - -) 

2 (- - 1) (- - -) (- - 2) (- - -) (- - -) 

3 (L - -) (- - -) (- - -) (- - 1) (- - -) 

4 (- - -) (- - -) (- - -) (- - -) (- - -) 

Agent internal state:
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 1) (- - -) (- - -) (- - -) 
1 (- 1 -) (- - -) (- - -) (- - -) (- - -) 
2 (- - 1) (- - -) (- - 2) (- - -) (- - -) 
3 (V - -) (V - -) (V - -) (- - 1) (- - -) 
4 (V - -) (- - -) (- - -) (- - -) (- - -) 

<STEP 13>
SELECTED ACTION:  TURN
Agent state:  (3, 0, UP)
Agent performance: 106

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 1) (- - -) (- - -) (- - -) 

1 (- 1 -) (- - -) (- - -) (- - -) (- - -) 

2 (- - 1) (- - -) (- - 2) (- - -) (- - -) 

3 (U - -) (- - -) (- - -) (- - 1) (- - -) 

4 (- - -) (- - -) (- - -) (- - -) (- - -) 

Agent internal state:
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A 

The maximum performance obtained by the <u>Reflex Agent</u> in the <u>Fully Observable Environment</u> was:

In [21]:
np.max(fitness_FullyObservableReflex)

129

while the max performance for all the runs for the <u>Model-Based Anges</u> in the same Environment was:

In [22]:
np.max(fitness_FullyObservableModel)

128

In average, the performance of the <u>Reflex Agent</u> in the <u>Fully Observable Environment</u> was:

In [25]:
totalFitness_FullyObservableReflex/numberOfTests

118.2

and the average for the <u>Model-Based Agent</u>:

In [26]:
totalFitness_FullyObservableModel/numberOfTests

117.6

<h1>Conclusions</h1>

<ul>
    <li><b>Which agent behaves better in the Partially Observable Environment?:</b></li>
        &emsp; During the tests that were carried out, we obtained better results in the vast majority of them using the Model-Based Agent which makes sense since it may not receive the full state of the environment and may not be able to see the gold pieces it is looking for, but it keep in his model some of the gold pieces already seen by it's percepts. In the case of the Reflex Agent, when it doesn't perceive any piece of gold it must explore the world which may lead to falling into traps.
    <li><b>Which agent behaves better in the Fully Observable Environment?:</b></li>
        &emsp; Using this kind of environment, both Agents had similar results because they didn't have to look for pieces of gold, their perceives always had the exact position of each gold in the Environment.
    <li><b>Are the Agents behaving rationally?:</b></li>
        &emsp; Yes, in some way. Whenever they are in the same column or row as some piece of gold, they try to go for it, if not they try to explore. But sometimes they don't try not to fall into traps.
    <li><b>What is better to pick all the gold in the environment? Less or more steps?:</b></li>
        &emsp; It depends of the number of pieces of gold in the environment, if there is a small number of gold then is better to set a small number of steps for an agent to perform because if we take all the gold ang the agent doesn't stop, it would continue to loss performance. But fortunately, the agents in this exercise do stop when there are no more gold left.
    <li><b>Was it fair to test with gold pieces and traps in fixed positions? Why not in random positions?:</b></li>
        &emsp; That would not have been fair because one agent may have had a more difficult layout than other.
</ul>

