# Foundations of Artificial Intelligence (BSc)
## Week 2 — What is AI? What is an Agent? (AIMA Ch. 2)

Name:

Date of last update:


In [None]:
### Today’s goals
By the end of this notebook you should be able to:
- Explain what an **agent** is (in AI terms).
- Describe an **environment** and its key properties.
- Define **rationality** using performance measures and constraints.
- Implement and explain a very simple **reflex agent**.
- Practise **explainability**: explain *why* your agent acts the way it does.

### How to use this notebook
- Read the markdown cells first.
- Run code cells in order.
- Fill in the **TODO** sections.
- Answer the reflection questions in **your own words**.

### Reading
- Russell & Norvig (AIMA), Chapter 2: Agents

## 0. Setup
Run this cell first. If something errors, ask for help.

In [None]:
import random
from typing import Dict, Tuple, List

random.seed(42)

print('Setup complete.')

## 1. Concepts: Agent, Environment, Percepts, Actions

In AIMA, an **agent** is anything that:
- **perceives** its environment (gets percepts)
- **acts** in the environment (takes actions)

A simple picture:

**Environment → Percepts → Agent → Actions → Environment**

### Quick check (write your answers)
**Q1:** Is a calculator an agent? Why or why not?

**Q2:** Is a thermostat an agent? Why or why not?

Write answers below.

### Your answers
- Q1:
- Q2:


## 2. A Tiny Environment: 2×2 Vacuum World

We will use a very small **grid environment**:
- The agent is on one square.
- Each square is either **dirty** or **clean**.
- The agent can:
  - move up/down/left/right
  - clean ("SUCK")

### Why this environment?
It is small enough to understand *every step* and still illustrates real AI ideas.


In [None]:
# Environment settings
ROWS = 2
COLS = 2

ACTIONS = ['UP', 'DOWN', 'LEFT', 'RIGHT', 'SUCK']

# We represent the world as a dictionary:
# world[(r, c)] = True means DIRTY
# world[(r, c)] = False means CLEAN

def make_random_world(rows: int, cols: int, dirt_prob: float = 0.7) -> Dict[Tuple[int,int], bool]:
    world = {}
    for r in range(rows):
        for c in range(cols):
            world[(r, c)] = (random.random() < dirt_prob)
    return world

def print_world(world: Dict[Tuple[int,int], bool], agent_pos: Tuple[int,int], rows: int, cols: int) -> None:
    # Simple text display
    for r in range(rows):
        row_cells = []
        for c in range(cols):
            is_dirty = world[(r, c)]
            if (r, c) == agent_pos:
                cell = 'A'  # agent
            else:
                cell = '.'
            cell += 'D' if is_dirty else 'C'
            row_cells.append(cell)
        print(' '.join(row_cells))
    print()

world = make_random_world(ROWS, COLS, dirt_prob=0.7)
agent_pos = (0, 0)

print('Initial world:')
print_world(world, agent_pos, ROWS, COLS)

## 3. Separating Environment from Agent

For explainability, we will separate:
- **Sense** (environment → percept)
- **Agent** (percept → action)
- **Act** (environment + action → new environment)

This separation helps you explain:
- what the agent knows
- what the agent decides
- how the environment changes


In [None]:
def sense(world: Dict[Tuple[int,int], bool], agent_pos: Tuple[int,int]) -> Dict[str, object]:
    """Return the percept. Here: location and whether current square is dirty."""
    r, c = agent_pos
    percept = {
        'pos': agent_pos,
        'is_dirty_here': world[(r, c)]
    }
    return percept

def act(world: Dict[Tuple[int,int], bool], agent_pos: Tuple[int,int], action: str, rows: int, cols: int) -> Tuple[Dict[Tuple[int,int], bool], Tuple[int,int]]:
    """Apply the action to the environment. Returns (new_world, new_agent_pos)."""
    r, c = agent_pos
    new_world = dict(world)  # copy
    new_pos = agent_pos

    if action == 'SUCK':
        # Clean the current square
        new_world[(r, c)] = False
        return new_world, new_pos

    if action == 'UP':
        if r > 0:
            new_pos = (r - 1, c)
    elif action == 'DOWN':
        if r < rows - 1:
            new_pos = (r + 1, c)
    elif action == 'LEFT':
        if c > 0:
            new_pos = (r, c - 1)
    elif action == 'RIGHT':
        if c < cols - 1:
            new_pos = (r, c + 1)

    return new_world, new_pos

percept = sense(world, agent_pos)
print('Example percept:', percept)

## 4. A Simple Reflex Agent

A reflex agent uses **if–else rules**.

### Reflex rule (very simple)
- If current square is dirty → SUCK
- Otherwise → move randomly

This is not “smart”, but it is a valid agent.

### TODO
Read the function and make sure you can explain it.


In [None]:
def reflex_agent(percept: Dict[str, object]) -> str:
    if percept['is_dirty_here']:
        return 'SUCK'
    else:
        return random.choice(['UP', 'DOWN', 'LEFT', 'RIGHT'])

# Test the agent decision once
test_percept = {'pos': (0,0), 'is_dirty_here': True}
print('If dirty ->', reflex_agent(test_percept))
test_percept = {'pos': (0,0), 'is_dirty_here': False}
print('If clean ->', reflex_agent(test_percept))

## 5. Running a Simulation

We will run the agent for a number of steps.

### Performance measure
We need a way to say if the agent is doing well.

For now:
- **+1** point for each clean square at each time step

This means the agent is rewarded for keeping the world clean.


In [None]:
def performance(world: Dict[Tuple[int,int], bool], rows: int, cols: int) -> int:
    # Count clean squares
    clean = 0
    for r in range(rows):
        for c in range(cols):
            if world[(r, c)] == False:
                clean += 1
    return clean

def run_simulation(agent_fn, steps: int = 10, rows: int = 2, cols: int = 2, dirt_prob: float = 0.7, show: bool = True):
    world = make_random_world(rows, cols, dirt_prob)
    agent_pos = (0, 0)
    total_score = 0

    if show:
        print('Starting simulation...')
        print_world(world, agent_pos, rows, cols)

    for t in range(steps):
        percept = sense(world, agent_pos)
        action = agent_fn(percept)
        world, agent_pos = act(world, agent_pos, action, rows, cols)

        score_t = performance(world, rows, cols)
        total_score += score_t

        if show:
            print(f'Time {t}: action={action}, score_this_step={score_t}')
            print_world(world, agent_pos, rows, cols)

    return total_score

score = run_simulation(reflex_agent, steps=8, rows=ROWS, cols=COLS, dirt_prob=0.7, show=True)
print('Total score:', score)

## 6. Explainability Task (Important)

Answer in your own words:

1. What information does the agent use to decide?
2. Why does the agent sometimes move "badly"?
3. What is the agent trying to maximise in this environment?

Write answers below.

### Your answers
- Q1:
- Q2:
- Q3:


## 7. Rationality Depends on the Performance Measure

Let’s change what we mean by “good”.

### New performance measure
- Clean squares are good
- BUT moving costs energy

We will implement:
- +1 for each clean square per step
- -1 for every move action

### TODO
Complete the function `performance_with_energy`.


In [None]:
def performance_with_energy(world: Dict[Tuple[int,int], bool], rows: int, cols: int, last_action: str) -> int:
    # TODO: start from clean squares score
    clean_score = 0
    for r in range(rows):
        for c in range(cols):
            if world[(r, c)] == False:
                clean_score += 1

    # TODO: subtract 1 for move actions (UP/DOWN/LEFT/RIGHT)
    move_penalty = 0
    if last_action in ['UP', 'DOWN', 'LEFT', 'RIGHT']:
        move_penalty = 1

    return clean_score - move_penalty

def run_simulation_energy(agent_fn, steps: int = 10, rows: int = 2, cols: int = 2, dirt_prob: float = 0.7, show: bool = True):
    world = make_random_world(rows, cols, dirt_prob)
    agent_pos = (0, 0)
    total_score = 0

    if show:
        print('Starting simulation (energy cost)...')
        print_world(world, agent_pos, rows, cols)

    for t in range(steps):
        percept = sense(world, agent_pos)
        action = agent_fn(percept)
        world, agent_pos = act(world, agent_pos, action, rows, cols)

        score_t = performance_with_energy(world, rows, cols, action)
        total_score += score_t

        if show:
            print(f'Time {t}: action={action}, score_this_step={score_t}')
            print_world(world, agent_pos, rows, cols)

    return total_score

score2 = run_simulation_energy(reflex_agent, steps=8, rows=ROWS, cols=COLS, dirt_prob=0.7, show=True)
print('Total score (energy):', score2)

### Reflection
1. Did the agent’s behaviour change? Why or why not?
2. Is the reflex agent rational under this new performance measure?

Write answers below.

### Your answers
- Q1:
- Q2:


## 8. Environment Types (Light Practice)

AIMA describes environment properties such as:
- fully observable vs partially observable
- deterministic vs stochastic
- episodic vs sequential
- static vs dynamic

### Task
Classify each environment (write short answers):
1. This vacuum world
2. Chess
3. Driving in London
4. A recommendation system (Netflix/YouTube)


### Your answers
1. Vacuum world:
2. Chess:
3. Driving in London:
4. Recommendation system:


## 9. Optional Challenge (For fast finishers)

### Challenge A: Better Reflex Agent
Change the agent so that when the current square is clean it prefers to move toward a dirty square.

Hints:
- You will need to give the agent more information (more percepts).
- For example, sense *adjacent squares*.

### Challenge B: Bigger world
Try a 3×3 grid and see how performance changes.


## 10. Exit Ticket (The things we need to know by today)

Answer briefly:
1. In one sentence, what is an **agent**?
2. In one sentence, what does **rational** mean in AI?
3. Name one thing that makes real environments harder than this vacuum world.


### Your exit ticket
1.
2.
3.
