### 2.9

__Implement a performance-measuring environment simulator for the vacuum-cleaner world depicted in Figure 2.2 and specified on page 38. Your implementation should be modular so that the sensors, actuators, and environment characteristics (size, shape, dirt placement, etc.) can be changed easily. (_Note:_ for some choices of programming language and operating system there are already implementations in the [online code repository](http://aima.cs.berkeley.edu/code.html).)__

The world in Figure 2.2 has two squares, "A" and "B". I have implemented this as `vacuum_cleaner_world.environments.SimpleVacuumWorld`.

The specifications from page 38 are as follow:

* The performance measure awards one point for each clean square at each time step, over a  "lifetime" of 1000 time steps.
* The geography of the environment is known _a priori_ but the dirt distribution and the initial location of the agent are not. Clean squares stay clean and sucking cleans the current square. The _Left_ and _Right_ actions move the agent left and right except when this would take the agent outside the environment, in which case the agent remains where it is.
* The only available actions are _Left_,  _Right_, and _Suck_.
* The agent correctly perceives its location and whether that location contains dirt.

In [1]:
from vacuum_cleaner_world.environment import SimpleVacuumWorld

### 2.10
__Consider a modified version of the vacuum environment in Exercise 2.9, in which the agent is penalized one point for each movement.__

__a. Can a simple reflex agent be perfectly rational for this environment? Explain.__

A simple reflex agent cannot be perfectly rational. Since clearly a reflex agent that returns the action `Clean` when the dirt sensor informs it that there is dirt in its location will do better than one that moves, the action for percepts `[A, Dirt]` and `[B, Dirt]` will be `Clean`. For `[A, No Dirt]` a simple reflex agent can either return `Clean`, `Left`, or `Right`.

`Clean` or `Left` will cause the agent to stay in the same place for its lifetime. The expected performance of this agent will be poor, if square B is expected to have dirt. However, a reflex agent which returns `Right` for `[A, No Dirt]` and `Left` for `[B, No Dirt]` will oscillate back and forth once both squares are cleaned. An agent that stays still after visiting both squares would perform better, but no set of condition-action rules,  and therefore no simple reflex agent, can implement such an agent function.

In [2]:
from vacuum_cleaner_world.agents import SimpleReflexAgent

In [3]:
env = SimpleVacuumWorld(move_penalty=True)

trials = 100
scores = [env.simulate(SimpleReflexAgent) for _ in range(trials)]

print('SimpleReflexAgent had an average performance of {:.0f} over {} trials.'.format(
    sum(scores)/len(scores), 
    trials))

SimpleReflexAgent had an average performance of 1000 over 100 trials.


It would be sufficient to show that other agents can do better than this to prove that `SimpleReflexAgent` is not rational.

__b. What about a reflex agent with state? Design such an agent.__

Since it takes just 1 movement to visit every square, a reflex agent in this environment only needs to keep track of how many moves it has made, and return `NoOp` if this is equal to 1.

The stateful reflex agent is implemented as `vacuum_cleaner_world.agents.StatefulReflexAgent`

The maximum scores attainable are:

__1999__ if there is initially no dirt, or dirt in the agent's starting square...

__1998__ if there is dirt in the opposite square...

__1997__ if there is initially dirt in both squares...

Assuming that the first score taken is after the agent makes its first move.

By manually setting the initial dirt and agent location, I show that the `StatefulReflexAgent` is rational.

In [4]:
from vacuum_cleaner_world.agents import StatefulReflexAgent

In [5]:
def simulate_all_dirt_agent_configs(AgentObject, **kwargs):
    dirt_inits = ['dirty', [0, 1], [1, 0], [0, 1], [1, 0], 'clean']
    agent_inits = [None, 'A', 'B', 'B', 'A', None]
    simulation_names = ['Both Squares Dirty', 
                        'Agent in A, Dirt in B',
                        'Agent in B, Dirt in A',
                        'Agent in B, Dirt in B',
                        'Agent in A, Dirt in A',
                        'Both Squares Clean']
    
    scores = []
    for dirt_init, agent_init, name in zip(dirt_inits, agent_inits, simulation_names):
        env = SimpleVacuumWorld(dirt_init=dirt_init, init_loc=agent_init, **kwargs)
        score = env.simulate(AgentObject)
        scores.append(score)
        
    return simulation_names, scores

In [6]:
names, scores = simulate_all_dirt_agent_configs(StatefulReflexAgent, move_penalty=True)
for name, score in zip(names, scores):
    print('{}: {}'.format(name, score))

Both Squares Dirty: 1997
Agent in A, Dirt in B: 1998
Agent in B, Dirt in A: 1998
Agent in B, Dirt in B: 1999
Agent in A, Dirt in A: 1999
Both Squares Clean: 1999


__c. How do your answers to *a* and *b* change if the agent's percepts give it the clean/dirty status of every square in the environment?__

A simple reflex agent can be perfectly rational in this environment. Therefore, a reflex agent with state can also be perfectly rational, since it can have the same set of condition-action rules as the simple reflex agent, and maintain a state variable that does not do anything (e.g. always being equal to 0).

I implement a simple reflex agent with access to all percepts as `vacuum_cleaner_world.agents.FullInfoReflexAgent`. Its condition action rules are:

| Percepts                    | Action | 
| --------------------------- |:------:|
| (`A`, `Dirt A`, `Dirt B`)   | `Suck` | 
| (`B`, `Dirt A`, `Dirt B`)   | `Suck` | 
| (`B`, `Clean A`, `Dirt B`)  | `Suck` | 
| (`A`, `Dirt A`, `Clean B`)  | `Suck` | 
| (`A`, `Clean A`, `Dirt B`)  | `Right`| 
| (`B`, `Dirt A`, `Clean B`)  | `Left` | 
| (`A`, `Clean A`, `Clean B`) | `NoOp` | 
| (`B`, `Clean A`, `Clean B`) | `NoOp` | 

In [7]:
from vacuum_cleaner_world.agents import FullInfoReflexAgent

In [8]:
names, scores = simulate_all_dirt_agent_configs(FullInfoReflexAgent, 
                                                move_penalty=True, 
                                                perfect_information=True)

for name, score in zip(names, scores):
    print('{}: {}'.format(name, score))

Both Squares Dirty: 1997
Agent in A, Dirt in B: 1998
Agent in B, Dirt in A: 1998
Agent in B, Dirt in B: 2000
Agent in A, Dirt in A: 2000
Both Squares Clean: 2000


Given that this agent receives more percepts, its standard of rationality is higher. It can achieve a perfect score of 2000 since it can know with certainty that there is no need to visit the other if there is no dirt there initially. Note that the reflex agent with state from __b__ is still rational, since given its percepts, its expected performance measure is low if it does not visit each square once, since there is some probability of it finding dirt in the square it did not start in.

### 2.11 

**Consider a modified version of the vacuum environment in Exercise 2.9, in which the geography of the environment - its extent, boundaries, and obstacles - is unknown, as is the initial dirt configuration. (The agent can go _Up_ and _Down_ as well as _Left_ and _Right_.)**

__a. Can a simple reflex agent be perfectly rational for this environment? Explain.__

A simple reflex agent cannot be rational, because it cannot maintain a model of the environment, which would be necessary for it to construct a model of the world. Even if it were to perceive that it was it square `A`, this information would not mean anything, because the agent would not know what squares, if any, would be next to square `A`. Since the geography of the environment is unknown, the only percept information a reflex agent can act on is that of whether there is dirt or no dirt in its current square. Therefore, the only implementable agent functions are those where the agent cleans, and then moves in a single direction whenever there is no dirt in its current square.

__b. Can a simple reflex agent with a _randomized_ agent function outperform a simple reflex agent? Design such an agent and measure its performance on several environments.__

An agent that moves randomly when its current square is clean will outperform an agent that moves in a single direction in most environments, excepting those with the geography of a straight line where the simple reflex agent happens to start at one end and has condition action rules that cause it to move towards the other end after cleaning. Outside of these special cases, random movements will explore the environment more effectively, if not efficiently.

In [4]:
from vacuum_cleaner_world.environment import UnknownVacuumWorld
from vacuum_cleaner_world.agents import RandomizedReflexAgent

In [9]:
env = UnknownVacuumWorld()

trials = 100
scores = [env.simulate(RandomizedReflexAgent) for _ in range(trials)]

print('RandomizedReflexAgent had an average performance of {:.2f} over {} trials.'.format(
    sum(scores)/len(scores), 
    trials))

RandomizedReflexAgent had an average performance of 1998.34 over 100 trials.
