## Intelligent Agents

In [22]:
import numpy as np
from itertools import product

### Agents and Environments

#### Agent
An agent is anything that perceives its environment through sensors and acts upon the environment through actuators.

#### Percept
Agent's perceptual inputs at any given instant.

An agent's percept sequence is the complete history of everything the agent has ever perceived.
  * Infinite sequence.
  
Abstractly, an agent is a function that maps the current percept sequence to an action.
  * The set of possible percept sequences is infinite; hence, writing out a exhaustive table for the agent function is infeasible.
  

### Rationality
An agent behaves rationally if the sequence of states its actions cause have optimal **performance measure**.

---

As a general rule, performance measures should be designed according to what one actually wants in the environment rather than according to how one thinks the agent should behave.

Rationality depends on

1. performance measure
2. prior knowledge of the environment
3. possible actions
4. percept sequence

_For each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept sequence and the prior knowledge of the environment._

#### Rationality vs Perfection
Rationality maximizes expected performance while perfection maximizes actual performance.

A perfect agent is impossible in practice since it must be omniscient. An omniscient agent knows the **actual** outcomes of its actions, not expected outcomes.

#### Autonomy
To the extent that an agent relies on the prior knowledge of its designer rather than its own percepts, we say that the agent lacks autonomy.

### Task Environment (PEAS)
Task environments are essentially problems to which the agents are the solutions.

1. Performance
2. Environment
3. Actuators
4. Sensors

In designing an agent, we first need to specify the task environment.

#### Fully Observable vs. Partially Observable
* **Fully observable:** The sensors specify the environment completely
* **Partially observable:** The sensors specify only some parts of the environment.

#### Single Agent vs. Multi Agent
An agent $A$ considers on object $B$ as an agent if $B$'s behavior is best decribed by maximizing a performance measure whose value depends on $A$'s behavior. (so there should be some kind of direct/indirect dependence between agents)

#### Deterministic vs. Stochastic
If the next state is completely determined by the current state and the agent's action, the environment is deterministic. On the other hand, if there is randomness involved in determining the next state, the environment is stochastic.

---

Consider a partially observable environment. There is uncertainty related to the unobserved parts of the environment.

If an environment is nondeterministic or partially observable, then it is **uncertain**.

#### Episodic vs. Sequential
In episodic environments, the timeline is divided into episodes and **different episodes are independent of each other.** However, in sequential environments, the current state may affect **all future states.**

An environment may be episodic where each episode is sequential in itself.

#### Static vs. Dynamic
If the environment changes while the agent is making a decision, it is dynamic; otherwise, it is static.
  * Dynamic environments are continuously asking the agent what it wants to do; if it hasn't decided yet, **that counts as doing nothing.**
  
---

An environment is semidynamic if the environment itself doesn't change with time but the agent's performance measure does.

#### Discrete vs. Continuous
Discrete/continuous space, discrete/continuous time, etc.

#### Known vs. Unknown
Not about the environment and observations about the environment, but rather the rules of the environment. In a known environment, the outcomes of all actions are known (the agent knows the expected outcome of its actions).

### Structure of Agents

#### Agent Program
Takes the current percept as input and returns the current action.

If need to rely on the percept sequence, the program has to store the past percepts.

#### Simple Reflex Agent
Take action according to the current percept, ignoring the rest of the percept history.

#### Model-based Reflex Agent
Keep a model of the world as an internal state that describes the unobserved parts of the environment according to the percept history.
Then, when deciding, use the information coming from the model to answer "how the world works" questions.

These agents need to know two types of information about the world:
1. How the world itself evolves?
2. How my actions affect the world?

#### Goal-based Agent
The agent has a goal state. In these cases, searching and planning are used.

Goal-based agents are generally more flexible than reflex agents, because you don't need to rewrite whole tables when the environment changes.

#### Utility-based Agent
Goal-based agents classify the states into binary classes: goal and no-goal. Utility-based agents associate a utility function with each state, and try to maximize this utility function.

If the utility function is chosen such that the utility function and external performance measurement are in line, then an agent that maximizes its utility is a rational one.

A utility based agent maximizes the expected utility of the action outcomes given the probabilities and utilities of each outcome.

#### Learning Agent
* **Performance element:** Select the best external action.
* **Critic:** Give feedback about the agent's external actions.
* **Learning element:** Make improvements on the performance element according to the feedback from the critic.
* **Problem generator:** Suggest possibly suboptimal but new actions so that the agent explores new possibilities which ultimately may lead to better outcomes.

All previous agents consisted only of the performance element. Hence, all the previous designs can be turned to a learning agent by incorporating critic and learning element parts.

### State Representation

#### Atomic
Each individual state is indivisible and has no internal configuration.
1. search and game-playing algorithms
2. HMMs
3. Markov Decision Processes, etc.

#### Factored
Each state is a fixed set of variables and attributes. In this representation, different states may have things in common (certain variables having the same value).
1. constraint satisfaction
2. Bayes nets
3. planning, etc.

#### Structured
Variables in the states are related as well.

## Exercises

### Exercise 2.8

In [32]:
def make_state(m, n):
    return np.zeros((m, n), dtype=int)

class Agent:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def take_action(self, state):
        if np.sum(state) == 0:
            return 'noop'
        elif state[self.x, self.y] == 1:
            return 'eat'
        elif self.y == 0:
            return 'right'
        else:
            return 'left'
        
class Simulator:
    def __init__(self, state, agent):
        self.cost = 0
        self.state = state
        self.agent = agent
        
    def simulate(self, t=10):
        for i in range(t):
            action = self.agent.take_action(self.state)
            if action == 'eat':
                x = self.agent.x
                y = self.agent.y
                self.state[x, y] = 0
                self.cost += 1
            elif action == 'left':
                self.agent.y = max(self.agent.y - 1, 0)
                self.cost += 1
            elif action == 'right':
                self.agent.y = min(self.agent.y + 1, self.state.shape[1])
                self.cost += 1
            elif action == 'noop':
                pass
            
            header = 'Step {}'.format(i)
            #print(header)
            #print('-'*len(header))
            #print('({}, {})'.format(self.agent.x, self.agent.y))
            #print(state)
        #print('Final cost: {}'.format(self.curr_cost))          
        
state = make_state(5, 5)
#state[3, 4] = 1
state[4, 4] = 1
state[4, 2] = 1
agent = Agent(4, 4)
simulator = Simulator(state, agent)
simulator.simulate()

### Exercise 2.9

In [43]:
state = make_state(1, 2)
total_cost = 0
total_sim = 0

for conf in product((0, 1), repeat=2):
    for y in range(2):
        state[0] = conf
        agent = Agent(0, y)
        simulator = Simulator(state, agent)
        
        print('State: {}'.format(state))
        print('Agent: {}'.format((agent.x, agent.y)))
         
        simulator.simulate()
        cost = simulator.cost
        total_cost += cost
        total_sim += 1
    
        print('Cost: {}'.format(cost))
        
print('Avg. cost: {:.2f}'.format(total_cost/total_sim))

State: [[0 0]]
Agent: (0, 0)
Cost: 0
State: [[0 0]]
Agent: (0, 1)
Cost: 0
State: [[0 1]]
Agent: (0, 0)
Cost: 2
State: [[0 1]]
Agent: (0, 1)
Cost: 1
State: [[1 0]]
Agent: (0, 0)
Cost: 1
State: [[1 0]]
Agent: (0, 1)
Cost: 2
State: [[1 1]]
Agent: (0, 0)
Cost: 3
State: [[1 1]]
Agent: (0, 1)
Cost: 3
Avg. cost: 1.50
