# üß† Python Foundations for Agents

In this notebook you‚Äôll learn how to structure *agent-style* code in Python: states, actions, environments, senses, and control loops.

You won‚Äôt need any external APIs or heavy libraries ‚Äî just Python. Later we‚Äôll build on this foundation to add learning, reasoning, and LLMs.

---
## 1Ô∏è‚É£ Agent / Environment Interface
Define basic skeletons for an agent and environment that interact in a loop.

In [None]:
from abc import ABC, abstractmethod

class Environment(ABC):
    @abstractmethod
    def reset(self):
        """Reset environment. Return initial state."""
        pass

    @abstractmethod
    def step(self, action):
        """Apply action, return (next_state, reward, done, info)."""
        pass


class Agent(ABC):
    @abstractmethod
    def act(self, state):
        """Given a state, choose an action."""
        pass

    def observe(self, state, action, reward, next_state, done):
        """Optional: agent can learn from outcome."""
        pass


def run_episode(env: Environment, agent: Agent, max_steps=100):
    state = env.reset()
    total_reward = 0.0
    for step in range(max_steps):
        action = agent.act(state)
        next_state, reward, done, info = env.step(action)
        agent.observe(state, action, reward, next_state, done)
        state = next_state
        total_reward += reward
        if done:
            break
    return total_reward

# Test skeleton with a dummy environment and agent
class DummyEnv(Environment):
    def __init__(self):
        self.counter = 0

    def reset(self):
        self.counter = 0
        return self.counter

    def step(self, action):
        self.counter += 1
        reward = 1.0
        done = (self.counter >= 5)
        return self.counter, reward, done, {}


class RandomAgent(Agent):
    def act(self, state):
        import random
        return random.choice([0, 1])

env = DummyEnv()
agent = RandomAgent()
print(run_episode(env, agent))

---
## 2Ô∏è‚É£ Representing State & Actions
Let's make a simple *grid world* environment: the agent lives in a 2D grid and can move up, down, left, or right.

In [None]:
class GridEnv(Environment):
    def __init__(self, width, height, start, goal):
        self.width = width
        self.height = height
        self.start = start
        self.goal = goal

    def reset(self):
        self.agent_pos = tuple(self.start)
        return self.agent_pos

    def step(self, action):
        x, y = self.agent_pos
        if action == 'up':
            y = min(self.height - 1, y + 1)
        elif action == 'down':
            y = max(0, y - 1)
        elif action == 'left':
            x = max(0, x - 1)
        elif action == 'right':
            x = min(self.width - 1, x + 1)
        else:
            raise ValueError('Unknown action')

        self.agent_pos = (x, y)
        reward = -0.1  # small penalty to encourage shorter paths
        done = False
        if self.agent_pos == self.goal:
            reward = 1.0
            done = True
        return self.agent_pos, reward, done, {}


class GreedyAgent(Agent):
    def __init__(self, goal):
        self.goal = goal

    def act(self, state):
        x, y = state
        gx, gy = self.goal
        if gx > x:
            return 'right'
        if gx < x:
            return 'left'
        if gy > y:
            return 'up'
        if gy < y:
            return 'down'
        return 'up'

env = GridEnv(width=5, height=5, start=(0, 0), goal=(4, 4))
agent = GreedyAgent(goal=(4, 4))
reward = run_episode(env, agent, max_steps=50)
print('Total reward:', reward)

---
## 3Ô∏è‚É£ Exercise: Random Walk + Heuristic Agent
Implement an agent that sometimes takes random steps and sometimes greedy ones. Try different values of epsilon and see how the total reward changes.

In [None]:
import random

class EpsilonGreedyAgent(Agent):
    def __init__(self, goal, epsilon=0.2):
        self.goal = goal
        self.epsilon = epsilon

    def act(self, state):
        if random.random() < self.epsilon:
            return random.choice(['up', 'down', 'left', 'right'])
        x, y = state
        gx, gy = self.goal
        dx = gx - x
        dy = gy - y
        if abs(dx) > abs(dy):
            return 'right' if dx > 0 else 'left'
        else:
            return 'up' if dy > 0 else 'down'

env = GridEnv(width=5, height=5, start=(0,0), goal=(4,4))
agent = EpsilonGreedyAgent(goal=(4,4), epsilon=0.3)
print(run_episode(env, agent, max_steps=50))

---
## 4Ô∏è‚É£ Wrapping Up & Next Steps
You‚Äôve built your first simulated agent!

In the next notebook, we‚Äôll implement **search and planning algorithms**:
- Breadth-First Search (BFS)
- Depth-First Search (DFS)
- A* with heuristics

These will turn your agent into something that can *plan optimal routes* instead of greedy moves.