First, we initialize all the libraries which are necessary.

Some of the classes imported are the following:
<ul>
    <li><b>FullyObservableEnvironment:</b></li>
        &emsp; This class contains the type of environment where you can perceive all places where there are Golds and Traps. All other relevant portions of the environment are also visible.
    <li><b>PartiallyObservableEnvironment:</b></li>
        &emsp; In this type of environment some states are hidden, that means, the agent(s) can never see the entire state of the environment. This kind of environment needs agents with memory to be solved.
    <li><b>ReflexAgent:</b></li>
        &emsp; This class implements the Simple Reflex Agents which acts only on basis of the percepts that the agents receives from the environment. It's actions are based on condition-action rules.
    <li><b>ModelBasedAgent:</b></li>
        &emsp; This is the kind of agents which maintains the structure that describes the part of the world which cannot see. This knowledge is what is called model of the world.
</ul>

In [1]:
import numpy as np
import random

from FullyObservableEnvironment import FullyObservableEnvironment
from PartiallyObservableEnvironment import PartiallyObservableEnvironment
from ReflexAgent import ReflexAgent
from ModelBasedAgent import ModelBasedAgent
from Objects import *

<h1><b>Partially Observable Environment</b></h1>

The first Agent to be tested is the Reflex Agent in a Partially Observable Environment.
In addition to the agent, we also add 5 pieces of gold and 6 traps in specified positions.

In [2]:
environment = PartiallyObservableEnvironment()

reflex_agent = ReflexAgent()
environment.add_thing(reflex_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

# for _ in range(10):
#     trap = Trap()
#     environment.add_thing(trap)    

# print('---------------------------')
# print('Initial State of Environment')
# print('---------------------------')
# print("Agent state: %s" % agent)
# print("Agent performance: %s" % agent.performance)
# print('')
# print('Environment:')
# print(environment)

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (U - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Percept
(- - -) (- - -) (- - -) 
(- - -) (U - 3) (- - -) 
(- 1 -) (- - -) (- - -) 

Agent state: (3, 1, UP)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  TURN
Agent state:  (3, 1, RIGHT)
Agent performance: 94

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (R - 2) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Percept
(- - -) (- - -) (- - -)

The second agent in the Partially Observable Environment is the Model Based Agent which will be tested with gold and traps at the same positions as the previous example.

In [3]:
environment = PartiallyObservableEnvironment()

model_agent = ModelBasedAgent()
environment.add_thing(model_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

# for _ in range(10):
#     trap = Trap()
#     environment.add_thing(trap)    

# print('---------------------------')
# print('Initial State of Environment')
# print('---------------------------')
# print("Agent state: %s" % agent)
# print("Agent performance: %s" % agent.performance)
# print('')
# print('Environment:')
# print(environment)

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (R - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Percept
(- 1 1) (- - -) 
(- - -) (R - -) 
(- - -) (- - 1) 

Agent internal state
(? ? ?) (? ? ?) (? ? ?) (? ? ?) (? ? ?) 
(? ? ?) (? ? ?) (? ? ?) (? ? ?) (? ? ?) 
(? ? ?) (? ? ?) (? ? ?) (- 1 1) (- - -) 
(? ? ?) (? ? ?) (? ? ?) (- - -) (V - -) 
(? ? ?) (? ? ?) (? ? ?) (- - -) (- - 1) 

Agent state: (3, 4, RIGHT)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  TURN
Agent state:  (3, 4, DOWN)
Agent performance: 99

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 

At the end of the implementation of the agents in the Partially Observable Environment we see the results of the <u>Reflex Agent's</u> performance:

In [4]:
reflex_agent.performance

110

... and the <u>Model-Based Agent</u>:

In [5]:
model_agent.performance

93

<h1><b>Fully Observable Environment</b></h1>

In this second part of the homework we use the Fully Observable Environment, first with the Reflex Agent inside it, as well as the past exercise, we use gold and traps in explicit positions.

In [6]:
environment = FullyObservableEnvironment()

reflex_agent = ReflexAgent()
environment.add_thing(reflex_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

# for _ in range(10):
#     trap = Trap()
#     environment.add_thing(trap)    

# print('---------------------------')
# print('Initial State of Environment')
# print('---------------------------')
# print("Agent state: %s" % agent)
# print("Agent performance: %s" % agent.performance)
# print('')
# print('Environment:')
# print(environment)

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (U - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Agent state: (1, 1, UP)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  ADVANCE
Agent state:  (0, 1, UP)
Agent performance: 109

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (U - -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (- - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

<STEP 2>
SELECTED ACTION:  TURN
Agent state:  (0, 1, RIGHT)
Agent performance: 108

Environment: 
     0       1  

And the Model Based Agent in the Fully Observable Environment.

In [7]:
environment = FullyObservableEnvironment()

model_agent = ModelBasedAgent()
environment.add_thing(model_agent)

gold = Gold()
environment.add_thing(gold, (4,0))
gold = Gold()
environment.add_thing(gold, (0,1))
gold = Gold()
environment.add_thing(gold, (2,3))
gold = Gold()
environment.add_thing(gold, (1,4))
gold = Gold()
environment.add_thing(gold, (1,4))

trap = Trap()
environment.add_thing(trap, (1,0))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (3,1))
trap = Trap()
environment.add_thing(trap, (2,3))
trap = Trap()
environment.add_thing(trap, (4,4))

# for _ in range(10):
#     trap = Trap()
#     environment.add_thing(trap)    

# print('---------------------------')
# print('Initial State of Environment')
# print('---------------------------')
# print("Agent state: %s" % agent)
# print("Agent performance: %s" % agent.performance)
# print('')
# print('Environment:')
# print(environment)

environment.run()

---------------------------
Initial State
---------------------------
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (R - -) (- - -) (- 1 1) (- - -) 

3 (- - -) (- - 3) (- - -) (- - -) (- - -) 

4 (- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Agent internal state
(- - -) (- 1 -) (- - -) (- - -) (- - -) 
(- - 1) (- - -) (- - -) (- - -) (- 2 -) 
(- - -) (V - -) (- - -) (- 1 1) (- - -) 
(- - -) (- - 3) (- - -) (- - -) (- - -) 
(- 1 -) (- - -) (- - -) (- - -) (- - 1) 

Agent state: (2, 1, RIGHT)
Agent performance: 100

---------------------------
Run details
---------------------------
<STEP 1>
SELECTED ACTION:  TURN
Agent state:  (2, 1, DOWN)
Agent performance: 99

Environment: 
     0       1       2       3       4
  (A G T) (A G T) (A G T) (A G T) (A G T)
0 (- - -) (- 1 -) (- - -) (- - -) (- - -) 

1 (- - 1) (- - -) (- - -) (- - -) (- 2 -) 

2 (- - -) (D - -) 

We can also see the performance of the <u>Reflex Agent</u>:

In [8]:
reflex_agent.performance

117

... and the <u>Model Based Agent</u> in the Fully Observable Environment:

In [9]:
model_agent.performance

113