# Foundations of Artificial Intelligence (BSc)
## Week 2 — What is AI? What is an Agent? (AIMA Ch. 2)

Name: Mukhammadsaiid Norbaev

Date of last update: 08/02/2026


### Today’s goals
By the end of this notebook you should be able to:
- Explain what an **agent** is (in AI terms).
- Describe an **environment** and its key properties.
- Define **rationality** using performance measures and constraints.
- Implement and explain a very simple **reflex agent**.
- Practise **explainability**: explain *why* your agent acts the way it does.

### How to use this notebook
- Read the markdown cells first.
- Run code cells in order.
- Fill in the **TODO** sections.
- Answer the reflection questions in **your own words**.

### Reading
- Russell & Norvig (AIMA), Chapter 2: Agents

## 0. Setup
Run this cell first. If something errors, ask for help.

In [1]:
import random
from typing import Dict, Tuple, List

random.seed(42)

print('Setup complete.')

Setup complete.


## 1. Concepts: Agent, Environment, Percepts, Actions

In AIMA, an **agent** is anything that:
- **perceives** its environment (gets percepts)
- **acts** in the environment (takes actions)

A simple picture:

**Environment → Percepts → Agent → Actions → Environment**

### Quick check (write your answers)
**Q1:** Is a calculator an agent? Why or why not?

**Q2:** Is a thermostat an agent? Why or why not?

Write answers below.

### Your answers
- Q1: yes, it is. as it acts within an environment in case of a caclulator it would be numbers and maths operators. it takes data and processes and gives the result, everything within an environment
- Q2: yes, it could also be considered an agent, environment in this case would be the temperature bar, and it takes percepts, which is it checking temperature and it takes action changing the its environment, the temperature bar accordingly


## 2. A Tiny Environment: 2×2 Vacuum World

We will use a very small **grid environment**:
- The agent is on one square.
- Each square is either **dirty** or **clean**.
- The agent can:
  - move up/down/left/right
  - clean ("SUCK")

### Why this environment?
It is small enough to understand *every step* and still illustrates real AI ideas.


In [None]:
# Environment settings
ROWS = 2
COLS = 2

ACTIONS = ['UP', 'DOWN', 'LEFT', 'RIGHT', 'SUCK']

# We represent the world as a dictionary:
# world[(r, c)] = True means DIRTY
# world[(r, c)] = False means CLEAN

def make_random_world(rows: int, cols: int, dirt_prob: float = 0.7) -> Dict[Tuple[int,int], bool]:
    # dirt_prob - controls the probability that each cell starts dirty
    world = {}
    for r in range(rows):
        for c in range(cols):
            world[(r, c)] = (random.random() < dirt_prob)
            # random.random() returns a float between 0.0 and 1.0
            # and the value will either be True (if float is < 0.7) or False (if float is > 0.7)
    return world

def print_world(world: Dict[Tuple[int,int], bool], agent_pos: Tuple[int,int], rows: int, cols: int) -> None:
    # Simple text display
    for r in range(rows):
        row_cells = []
        for c in range(cols): # column iteration; left to right
            is_dirty = world[(r, c)] # boolean, checks whether value at this key is true(dirty) or false(clean)
            if (r, c) == agent_pos: # agent position check 
                cell = 'A'  # agent is here
            else:
                cell = '.' # if agent is not here
            cell += 'D' if is_dirty else 'C' # adding dirt status ex: if A and is_dirty => "AD", else ".C"
            row_cells.append(cell) # adds the finished cell string
        print(' '.join(row_cells))
        # prints one row, space-seperated
    print()


def print_world_verbose(world, agent_pos, rows, cols):
    # print the same thing but different looking
    print("Initial world:\n")
    for r in range(rows):
        for c in range(cols):
            is_dirty = world[(r, c)]
            agent = "A" if (r, c) == agent_pos else "."
            state = "DIRTY" if is_dirty else "CLEAN"
            print(f"({r}, {c}): {agent}  |  {state} ({is_dirty})")
        print()

world = make_random_world(ROWS, COLS, dirt_prob=0.7)
agent_pos = (0, 0)

print('Initial world:')
print_world(world, agent_pos, ROWS, COLS)

print(world)

print_world_verbose(world, agent_pos, ROWS, COLS)

Initial world:
AC .D
.D .D

{(0, 0): False, (0, 1): True, (1, 0): True, (1, 1): True}
Initial world:

(0, 0): A  |  CLEAN (False)
(0, 1): .  |  DIRTY (True)

(1, 0): .  |  DIRTY (True)
(1, 1): .  |  DIRTY (True)



## 3. Separating Environment from Agent

For explainability, we will separate:
- **Sense** (environment → percept)
- **Agent** (percept → action)
- **Act** (environment + action → new environment)

This separation helps you explain:
- what the agent knows
- what the agent decides
- how the environment changes


In [8]:
def sense(world: Dict[Tuple[int,int], bool], agent_pos: Tuple[int,int]) -> Dict[str, object]:
    """Return the percept. Here: location and whether current square is dirty."""
    # the point here is separation of concerns, core ai concept
    # agent receives only where it is and whether its current tile is dirty, not the full world
    """in ai, an agent:
    does not have access to world
    only receives a percept from the environment
    this function simulates that boundary
    sense() is like the contract that limits what the agent is allowed to know about the world."""
    r, c = agent_pos
    percept = {
        'pos': agent_pos,
        'is_dirty_here': world[(r, c)]
    }
    return percept

def act(world: Dict[Tuple[int,int], bool], agent_pos: Tuple[int,int], action: str, rows: int, cols: int) -> Tuple[Dict[Tuple[int,int], bool], Tuple[int,int]]:
    """Apply the action to the environment. Returns (new_world, new_agent_pos)."""
    r, c = agent_pos
    new_world = dict(world)  # copy
    new_pos = agent_pos

    if action == 'SUCK':
        # Clean the current square
        new_world[(r, c)] = False # false means clean
        return new_world, new_pos # changed world is returned, given tile has been cleaned

    if action == 'UP':
        if r > 0: # this checks whether it is possible to go up
            new_pos = (r - 1, c)
    elif action == 'DOWN':
        if r < rows - 1:
            new_pos = (r + 1, c)
    elif action == 'LEFT':
        if c > 0:
            new_pos = (r, c - 1)
    elif action == 'RIGHT':
        if c < cols - 1:
            new_pos = (r, c + 1)

    return new_world, new_pos

percept = sense(world, agent_pos)
print('Example percept:', percept)

Example percept: {'pos': (0, 0), 'is_dirty_here': False}


## 4. A Simple Reflex Agent

A reflex agent uses **if–else rules**.

### Reflex rule (very simple)
- If current square is dirty → SUCK
- Otherwise → move randomly

This is not “smart”, but it is a valid agent.

### TODO
Read the function and make sure you can explain it.


In [11]:
def reflex_agent(percept: Dict[str, object]) -> str:
    # percept: Dict[str, object] -- this is a type contract that says:
    # keys must be str
    # values must be instances of object
    """the reflex agent does not care about pos, types; it only cares that " is_dirty_here" exists, it is true or false"""
    if percept['is_dirty_here']:
        return 'SUCK'
    else:
        return random.choice(['UP', 'DOWN', 'LEFT', 'RIGHT'])

# Test the agent decision once
test_percept = {'pos': (0,0), 'is_dirty_here': True}
print('If dirty ->', reflex_agent(test_percept))
test_percept = {'pos': (0,0), 'is_dirty_here': False}
print('If clean ->', reflex_agent(test_percept))

If dirty -> SUCK
If clean -> UP


## 5. Running a Simulation

We will run the agent for a number of steps.

### Performance measure
We need a way to say if the agent is doing well.

For now:
- **+1** point for each clean square at each time step

This means the agent is rewarded for keeping the world clean.


In [14]:
def performance(world: Dict[Tuple[int,int], bool], rows: int, cols: int) -> int:
    # world contains all the tiles with bool value 
    # Count clean squares
    clean = 0
    for r in range(rows):
        for c in range(cols):
            if world[(r, c)] == False:
                clean += 1
    return clean

def run_simulation(agent_fn, steps: int = 10, rows: int = 2, cols: int = 2, dirt_prob: float = 0.7, show: bool = True):
    world = make_random_world(rows, cols, dirt_prob)
    agent_pos = (0, 0)
    total_score = 0

    if show:
        print('Starting simulation...')
        print_world(world, agent_pos, rows, cols)

    for t in range(steps):
        percept = sense(world, agent_pos) # this returns dict[]{
                                                                        #   'pos': agent_pos,
                                                                        #   'is_dirty_here': world[(r, c)]
                                                                        # }
        """percept extracts only local information, where the agent is, not the whole world, so sense only return one dict with two elements
        pos: value and is dirty : value
        in this case if first iteration it is as agent pos (0, 0) it will return {pos: (0, 0) and is_dirty_here: True or False}
        percept is what the agent senses at its feet not a map of the hosue"""
        action = agent_fn(percept) # agent_fn is a placeholder name for whatever agent you pass in, in our case it is reflex_agent
        """action has whatever the agent passes, in here, reflex agent returns str either SUCK or random move UP, DOWN, LEFT, RIGHT"""
        world, agent_pos = act(world, agent_pos, action, rows, cols) # acts: moves, cleans => returns new world and agent pos

        score_t = performance(world, rows, cols) # this line has number of clean tiles from the changed world
        total_score += score_t # total_score accumulates the total number of clean tiles over time (state-based reward, not per-action)
        """total_score = 1 + 2 + 2 + 3 + 4 = 12
            ths is what it could look like, this is not counting clean actions
            it is summing state quality over time, summing the whole state"""


        if show:
            print(f'Time {t}: action={action}, score_this_step={score_t}') # i am not sure why we need flag show, it is alwayws true, we are not chagning it
            # the output would be the same even without show i think
            print_world(world, agent_pos, rows, cols)

    return total_score

score = run_simulation(reflex_agent, steps=8, rows=ROWS, cols=COLS, dirt_prob=0.7, show=True)
print('Total score:', score)

Starting simulation...
AD .D
.D .D

Time 0: action=SUCK, score_this_step=1
AC .D
.D .D

Time 1: action=DOWN, score_this_step=1
.C .D
AD .D

Time 2: action=SUCK, score_this_step=2
.C .D
AC .D

Time 3: action=LEFT, score_this_step=2
.C .D
AC .D

Time 4: action=UP, score_this_step=2
AC .D
.C .D

Time 5: action=DOWN, score_this_step=2
.C .D
AC .D

Time 6: action=DOWN, score_this_step=2
.C .D
AC .D

Time 7: action=DOWN, score_this_step=2
.C .D
AC .D

Total score: 14


## 6. Explainability Task (Important)

Answer in your own words:

1. What information does the agent use to decide? 
2. Why does the agent sometimes move "badly"?
3. What is the agent trying to maximise in this environment?

Write answers below.

### Your answers
- Q1: it uses the percept provided by the environment, it has the location and info clean/dirty, the agent can't see the whole world, only percept
- Q2: it may go to the location, which is clean, where nothing is required to do, in this case resources are wasted
- Q3: performance measure, number of clean tiles


## 7. Rationality Depends on the Performance Measure

Let’s change what we mean by “good”.

### New performance measure
- Clean squares are good
- BUT moving costs energy

We will implement:
- +1 for each clean square per step
- -1 for every move action

### TODO
Complete the function `performance_with_energy`.


In [15]:
def performance_with_energy(world: Dict[Tuple[int,int], bool], rows: int, cols: int, last_action: str) -> int:
    # TODO: start from clean squares score
    clean_score = 0
    for r in range(rows):
        for c in range(cols):
            if world[(r, c)] == False:
                clean_score += 1

    # TODO: subtract 1 for move actions (UP/DOWN/LEFT/RIGHT)
    move_penalty = 0
    if last_action in ['UP', 'DOWN', 'LEFT', 'RIGHT']:
        move_penalty = 1

    return clean_score - move_penalty

def run_simulation_energy(agent_fn, steps: int = 10, rows: int = 2, cols: int = 2, dirt_prob: float = 0.7, show: bool = True):
    world = make_random_world(rows, cols, dirt_prob)
    agent_pos = (0, 0)
    total_score = 0

    if show:
        print('Starting simulation (energy cost)...')
        print_world(world, agent_pos, rows, cols)

    for t in range(steps):
        percept = sense(world, agent_pos) # returns dict {pos:x,y; isdirtyhere:true/false}
        action = agent_fn(percept) # returns action, suck or move
        world, agent_pos = act(world, agent_pos, action, rows, cols) # acts/ cleans, returns new world

        score_t = performance_with_energy(world, rows, cols, action) # returns clean_score - move_penalty of the new world
        total_score += score_t # adds the performance score to total which is the difference of clean-socre and move penalty

        if show:
            print(f'Time {t}: action={action}, score_this_step={score_t}')
            print_world(world, agent_pos, rows, cols)

    return total_score

score2 = run_simulation_energy(reflex_agent, steps=8, rows=ROWS, cols=COLS, dirt_prob=0.7, show=True)
print('Total score (energy):', score2)

Starting simulation (energy cost)...
AD .D
.C .D

Time 0: action=SUCK, score_this_step=2
AC .D
.C .D

Time 1: action=DOWN, score_this_step=1
.C .D
AC .D

Time 2: action=LEFT, score_this_step=1
.C .D
AC .D

Time 3: action=UP, score_this_step=1
AC .D
.C .D

Time 4: action=DOWN, score_this_step=1
.C .D
AC .D

Time 5: action=UP, score_this_step=1
AC .D
.C .D

Time 6: action=LEFT, score_this_step=1
AC .D
.C .D

Time 7: action=RIGHT, score_this_step=1
.C AD
.C .D

Total score (energy): 9


### Reflection
1. Did the agent’s behaviour change? Why or why not?
2. Is the reflex agent rational under this new performance measure?

Write answers below.

### Your answers
- Q1: no it didn't, we are just penalyzing for moves and substracting it from cleanliness score, the agent is behaving the same way, it is not aware of the energy penalty
- Q2: no, it is the same, it does not maximise the performance measure


## 8. Environment Types (Light Practice)

AIMA describes environment properties such as:
- fully observable vs partially observable
- deterministic vs stochastic
- episodic vs sequential
- static vs dynamic

### Task
Classify each environment (write short answers):
1. This vacuum world
2. Chess
3. Driving in London
4. A recommendation system (Netflix/YouTube)


## Fully observable vs partially observable

- Fully observable: agent can see the complete state relevant to decision-making
- Partially observable: agent cannot see everything (hidden info, noise, uncertainty)

## Deterministic vs stochastic

- Deterministic: next state is completely determined by current state + action
- Stochastic: outcomes involve randomness or uncertainty
- This is not about “complexity” — it’s about uncertainty in outcomes.

## Episodic vs sequential

- Episodic: each decision is independent of previous ones
- Sequential: current actions affect future states and decisions

## Static vs dynamic

- Static: environment does not change while the agent is deciding
- Dynamic: environment can change independently of the agent

### Your answers
1. Vacuum world: partially observable, stochastic, episodic, static
2. Chess: fully observable (before moving it has to know where the other figures are), determistic, sequental, static
3. Driving in London: partially observable, stochastic, sequential, dynamic
4. Recommendation system: partially observable, stochastic, sequential, dynamic


## 9. Optional Challenge (For fast finishers)

### Challenge A: Better Reflex Agent
Change the agent so that when the current square is clean it prefers to move toward a dirty square.

Hints:
- You will need to give the agent more information (more percepts).
- For example, sense *adjacent squares*.

### Challenge B: Bigger world
Try a 3×3 grid and see how performance changes.


In [25]:
# 9. Optional Challenge (For fast finishers)

# Challenge A: Better Reflex Agent
# Change the agent so that when the current square is clean it prefers to move toward a dirty square.

# Hints:
# - You will need to give the agent more information (more percepts).
# - For example, sense *adjacent squares*.

def sense_with_neighbors(world: Dict[Tuple[int,int], bool],
                         agent_pos: Tuple[int,int],
                         rows: int,
                         cols: int) -> Dict[str, object]:
    """
    Returns a percept that includes:
    - agent position
    - whether current square is dirty
    - dirt status of adjacent squares
    """

    r, c = agent_pos

    percept = {
        'pos': agent_pos,
        'is_dirty_here': world[(r, c)],
        'neighbors': {}
    }

    # Check each direction safely (stay inside grid)
    if r > 0:
        percept['neighbors']['UP'] = world[(r - 1, c)]
    if r < rows - 1:
        percept['neighbors']['DOWN'] = world[(r + 1, c)]
    if c > 0:
        percept['neighbors']['LEFT'] = world[(r, c - 1)]
    if c < cols - 1:
        percept['neighbors']['RIGHT'] = world[(r, c + 1)]

    return percept

# percept example:
# {
#   'pos': (0, 0),
#   'is_dirty_here': False,
#   'neighbors': {
#       'DOWN': True,
#       'RIGHT': False
#   }
# }

def better_reflex_agent(percept: Dict[str, object]) -> str:
    """
    Improved reflex agent:
    - If current square is dirty → SUCK
    - Else, move toward a dirty neighboring square if one exists
    - Else, move randomly
    """

    # If current tile is dirty, clean it
    if percept['is_dirty_here']:
        return 'SUCK'

    # Look at neighboring tiles
    neighbors = percept['neighbors']

    # Prefer moving toward a dirty neighbor
    for direction, is_dirty in neighbors.items():
        if is_dirty:
            return direction

    # If no dirty neighbors, move randomly
    return random.choice(['UP', 'DOWN', 'LEFT', 'RIGHT'])


def run_simulation_energy(agent_fn, steps: int = 10, rows: int = 2, cols: int = 2, dirt_prob: float = 0.7, show: bool = True):
    world = make_random_world(rows, cols, dirt_prob)
    agent_pos = (0, 0)
    total_score = 0

    if show:
        print('Starting simulation (energy cost)...')
        print_world(world, agent_pos, rows, cols)

    for t in range(steps):
        percept = sense_with_neighbors(world, agent_pos, rows, cols) # returns dict {pos:x,y; isdirtyhere:true/false}
        action = agent_fn(percept) # returns action, suck or move
        world, agent_pos = act(world, agent_pos, action, rows, cols) # acts/ cleans, returns new world

        score_t = performance_with_energy(world, rows, cols, action) # returns clean_score - move_penalty of the new world
        total_score += score_t # adds the performance score to total which is the difference of clean-socre and move penalty

        if show:
            print(f'Time {t}: action={action}, score_this_step={score_t}')
            print_world(world, agent_pos, rows, cols)

        all_clean = all(not dirty for dirty in world.values())
        if all_clean:
            if show:
                print(f"all tiles clean at time {t}. Ending simulation eaerly.")
            break

    return total_score

# percept = sense_with_neighbors(world, agent_pos, rows, cols)
# action = agent_fn(percept)

score = run_simulation_energy(
    better_reflex_agent,
    steps=8,
    rows=ROWS,
    cols=COLS,
    dirt_prob=0.7,
    show=True
)
print(score)



Starting simulation (energy cost)...
AC .C
.C .D

Time 0: action=UP, score_this_step=2
AC .C
.C .D

Time 1: action=DOWN, score_this_step=2
.C .C
AC .D

Time 2: action=RIGHT, score_this_step=2
.C .C
.C AD

Time 3: action=SUCK, score_this_step=4
.C .C
.C AC

all tiles clean at time 3. Ending simulation eaerly.
10


## 10. Exit Ticket (The things we need to know by today)

Answer briefly:
1. In one sentence, what is an **agent**?
2. In one sentence, what does **rational** mean in AI?
3. Name one thing that makes real environments harder than this vacuum world.


### Your exit ticket
1. agent is an entity that perceives its environment through sensors and acts upon that environment through actions
2. making the best possible solution considering the environemnt, that maximises performance measure
3. dynamic changes, partial observability
