In [1]:
from agents4e import *
from notebook import psource

%load_ext autoreload
%autoreload 2

# Constructing Environments

## The Environment class
Is a shell for all environments. It owns **things** and **agents**. It specifies:
* The thing classes it can hold. These things can be just things (like dirt) or agents (like vacuums that can do stuff)
* What it can perceive (percept classes -- like what sensors are on my robot?)
* What it can do. Like if a vacuum sucks up dirt, then it can change the amount of dirt in its environment.
* Specify a default location for new things, like where more dirt might go.
* Specify changes we won't allow ("exogenous_change")
* Tell us if all of the agents are dead
* Perform one time step in our environmental "game" definition.
  * Each agent gets to perceive its state
  * Each agent gets to perform an action
* Perform a bunch of steps
* List all the things at a location
* List some things at a location?
* Add a thing at a location (or default location)
* Delete a specified thing

In [2]:
psource(Environment)

## The Direction Class
* Specify a heading (**R**ight, **L**eft, **U**p, **Do**wn)
* Move Forward

In [3]:
psource(Direction)

## Environments on a plane XYEnvironment
* Rectangle with width and height
  * also initialize a list of observers -- this might be a list provided to the GUI that tells us when things change
* Things near a location (based on perceptible_distance = 1 or specified radius)
* Return what things I can see
* Execute an action
  * bump against an edge
  * turn left or right
  * move forward
  * grab a thing
  * release a thing
* Add observers who get to find out what's happened
* Move to where I say to move:
  * If there's an *Obstacle* at my destination I bump. Obstacles are their own trivial class that can be extended into more 
    complicated obstacles that are sets of coordinates.
  * Otherwise tell all observers the thing moved and remove the thing from the old destination and put it in the new one
  * Return True/False for whether or not I moved the thing
* Add a thing to a location. Say what to do if there's a thing there.
* Check to see if the location some jerk specified is actually in my rectangle.
* Randomly choose a location in my rectangle, and maybe I'll list some patches that aren't allowed.
* Delete a thing from the environment. If that thing is an agent drop everything it's holding.
* Add walls so the vacuum doesn't fall down the stairs. A *Wall* is its own trivial class.
* Describe the new heading after a turn happens

In [4]:
psource(XYEnvironment)

## GraphicEnvironment(XYEnvironment)
Handles the GUI

In [5]:
psource(GraphicEnvironment)

## Example: Let's make a vacuum environment
<p align="center>
<img src="images/vacuum.svg">
</p>

* Initialize Dirt as a Trivial thing

* Start with a trivial 2-grid environment

* Extend the XYEnvironment to **VacuumEnvironment**:
  * The things are Wall, Dirt, and Four agents (reflux, random, tabledriven, and modelbased) for vacuum behavior. We'll get to those.
  * The environment knows if an agent (a.k.a. a vacuum) is  standing in dirt and if it will bump into something if it moves forward.
  * An agent can execute an action:
    * If the action is suck it gets 100 points (**performance**) and it deletes the dirt, otherwise the performance is -1 and the action is exected according to the agent's logic.

### Trivial Vacuum Environment

This one just moves between two grids -- no headings or looking for obstacles is needed, and it can suck if the floor is dirty.

In [6]:
psource(TrivialVacuumEnvironment)

### Nontrivial vacuum environment

Note that this environment can
* Tell if the floor is dirty or not
* Tell if the agent is bumping against an obstacle
* Execute the "suck" action and assign an award **if** that action is successful/possible

You'll see the `execute_action` refers to the XYEnvironment that already handled how the robot vacuum will handle movement (checking for obstacles).

You'll also see that this basic environment can handle multiple types of vacuum agents, and we're going to pay attention to the differing behaviors of allowed agents.

Also let's pay attention to rewards. There's a "NoOp" action that is neutral. Otherwise, the only reward is actually picking up dirt (+100) and the costs of movement are a reward of -1.

In [7]:
psource(VacuumEnvironment)

# Defining Agents

## An agent is a thing inside your environnment

The initial **Thing** class is really just a shell. We'll want to define whether the thing is *alive*, how to display our thing's state, and how to display maybe a picture of our thing, like if our thing were a function we could make a picture. The **Agent** class is a subclass that performs actions based on what it perceives in the environment. To keep things general, this agent class will take in a user-defined FUNCTION that turns perceptions into actions.

In [8]:
psource(Thing)

In [9]:
psource(Agent)

## How do we know what the agent did?

In [10]:
psource(TraceAgent)


## BORING AGENT EXAMPLES

### Random

In [11]:
psource(RandomAgentProgram)

In [12]:
psource(RandomVacuumAgent)

### Table-driven agents

In [13]:
psource(TableDrivenAgentProgram)

In [14]:
psource(TableDrivenVacuumAgent)

Can you imagine making a table for the 2D environment? This is just not scalable!

## REFLEX AGENTS

Here's the take-away: a reflex agent has rules for its behavior based on what it perceives in its environment. In reinforcement learning, we typically call those rules a **policy** for actions.

In [15]:
psource(SimpleReflexAgentProgram)

### The ReflexVacuumAgent


<p align="center">
<img src="images/simple_reflex_agent.jpg">
</p>

Perceive the location and status
* If the status is dirty, suck
* Otherwise move to the other location

#### Trivial Reflex Vacuum Agent



In [16]:
psource(ReflexVacuumAgent)

##### Running the trivial reflex vacuum agent

In [17]:
agent = ReflexVacuumAgent()
dirt = Dirt()
environment = TrivialVacuumEnvironment()
environment.add_thing(agent)
environment.status


{(0, 0): 'Dirty', (1, 0): 'Clean'}

In [18]:

environment.run()
environment.status


{(0, 0): 'Clean', (1, 0): 'Clean'}

#### Nontrivial Vacuum Agent

I wanted to think this through before I saw their solution.

So maybe we'd say
* if the location is dirty, suck (performance +100)
* Otherwise if no bump move forward (performance -1)
* Otherwise move heading to the left or right -- probably want to be random (performance -1)

The agent itself is tracking its direction and an internal concept of location using the "Direction" class that's defined above. That class is already integrated into the XYEnvironment defining `TurnLeft`, `TurnRight`, and `Forward`.

In [19]:
import random
from agents4e import Agent

def XYReflexVacuumAgent():
    """ Extend trivial example

    The XY environment returns the state
    ('Dirty'/'Clean', 'Bump'/'None')

    The agent itself is tracking its location.
    
    """

    def program(percept):
        status, bump = percept
        if status == 'Dirty':
            return 'Suck'
        elif bump == 'Bump': 
            # return 'TurnLeft'
            return random.choice(['TurnLeft', 'TurnRight'])
        else:
            return 'Forward'


    return Agent(program = program, \
        direction = Direction(random.choice(["left", "right", "up", "down"])))


   


In [20]:
TraceAgent(vacuum)

NameError: name 'vacuum' is not defined

In [None]:
dirt = Dirt()
dirt.is_alive()

False

In [None]:
vacuum = XYReflexVacuumAgent()
environment = VacuumEnvironment()
environment.add_thing(vacuum)
for idx in range(10):
    dirt = Dirt()
    environment.add_thing(dirt, [random.randint(0,9),random.randint(0,9)])
vacuum.performance


0

In [None]:
def stat(tf):
    return "DIRT" if tf == True else " c  "

def show_dirt(env):
    for idx in range(10):
        dirt_print = [stat(env.some_things_at([idx, j], Dirt)) for j in range(10)]
        print(dirt_print)

In [None]:
show_dirt(environment)

[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', 'DIRT', 'DIRT', ' c  ', ' c  ', 'DIRT', 'DIRT', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', 'DIRT', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', 'DIRT', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', 'DIRT', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', 'DIRT', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', 'DIRT', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']


In [None]:
TraceAgent(vacuum)
environment.run(steps = 5000)


<Agent> perceives ('Clean', 'None') and does Forward with performance 0
<Agent> perceives ('Clean', 'None') and does Forward with performance -1
<Agent> perceives ('Dirty', 'None') and does Suck with performance -2
<Agent> perceives ('Clean', 'None') and does Forward with performance 97
<Agent> perceives ('Clean', 'None') and does Forward with performance 96
<Agent> perceives ('Clean', 'Bump') and does TurnLeft with performance 95
<Agent> perceives ('Clean', 'None') and does Forward with performance 94
<Agent> perceives ('Clean', 'None') and does Forward with performance 93
<Agent> perceives ('Clean', 'None') and does Forward with performance 92
<Agent> perceives ('Clean', 'None') and does Forward with performance 91
<Agent> perceives ('Clean', 'None') and does Forward with performance 90
<Agent> perceives ('Clean', 'None') and does Forward with performance 89
<Agent> perceives ('Clean', 'None') and does Forward with performance 88
<Agent> perceives ('Clean', 'Bump') and does TurnRight

In [None]:
agent.performance

-978

In [None]:
show_dirt(environment)

[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', 'DIRT', 'DIRT', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', 'DIRT', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', 'DIRT', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']
[' c  ', ' c  ', 'DIRT', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ', ' c  ']


#### Performance
It kind of sucks. My vacuum ins't smart enough.

## MODEL-BASED REFLEX AGENTS 

Let's recall the diagram for the simple reflex agent

<p align="center">
<img src="images/simple_reflex_agent.jpg">
</p>

and compare to this new model-based agent

<p align="center">
<img src="images/model_based_reflex_agent.jpg">
</p>

and you can see that now the robot is trying to form a model to understand the consequences of its actions.

In [None]:
psource(ModelBasedReflexAgentProgram)

### Trivial Model-Based Vacuum Agent

In [None]:
psource(ModelBasedVacuumAgent)

This really isn't much better -- it just knows not to keep cleaning if it's done. But the environment was already doing that for us, and in my 2D example it couldn't find all of the spots to clean efficiently so it would never stop anyhow. That's because it really needs to figure out which movements were helpful and to then learn how to move more efficiently. To do that we need to incorporate the performance in its movement rules.

## GOAL-BASED AGENTS

<p align="center">
<img src="images/model_based_reflex_agent.jpg">
</p>

<p align="center">
<img src="images/model_goal_based_agent.jpg">
</p>

You can see that we're now allowing ourselves to learn what actions lead to a specific state. So, for example, could we decide which direction to turn based on whether there's a dirty patch to the left or the right.


## UTILITY-BASED VS. GOAL-BASED AGENTS

<p align="center">
<img src="images/model_goal_based_agent.jpg">
</p>
<p align="center">
<img src="images/model_utility_based_agent.jpg">
</p>

Now we're not focused on landing in a particular state, but rather we're focused on moving towards a state that we like. We'll learn in later sections how you can train a model by moving through your environment and learning which steps *eventually* get you to the state you like. So the **utility** of a move is related to whether it's getting you closer to your end-goal. So could we know that long-term we're more likely to clean the most patches by moving in a consistent way? I think I'm probably getting ahead of myself -- this is all sending us here:k

## AGENTS THAT LEARN

<p align="center">
<img src="images/model_goal_based_agent.jpg">
</p>

<p align="center">
<img src="images/general_learning_agent.jpg">
</p>

The **learning element** is responsible for making improvements, while the **performance element** actually makes the current choice of action. This latter part is what was the entire agent in previous iterations. On the other hand, the **critic** evaluates the agent's performance (how did the environment change after the agent's last action?) and the learning element uses this feedback to update the logic in the performance element.

The role of the **problem generator** is to figure out what actions we should take so that we learn the most from our experiments.

# How can we best describe that state?

<p align="center">
<img src="images/atomic-factored-structured.png">
</p>

## Atomic

A single string or number or ...

## Factored representation

A preset list of key-value pairs

## Structured representation

Describes the relationships between the things in the environment