# Task B3 — Minimal AI Agents (NumPy)
Upper Secondary / Section B — Structured Tasks

Constraints: Use Python 3 and NumPy only. No external LLM calls. We simulate planning with a deterministic stub.

Scope:
- Create a minimal LLM-enabled agent with tool use and step-wise planning (simulated).
- Compare single vs multi-agent designs (roles and coordination).
- Evaluate with success rate, error taxonomy, and simple cost/time counts.
- Apply guardrails: permissions, sandboxing, rate limits, and transparent logs.

## Structured Theory Questions
1. Sketch the agent loop (sense → plan → act). What state is tracked?
2. List two risks of autonomous tool-use and one guardrail for each.
3. Give two evaluation metrics suitable for small deterministic tasks.

## Design Challenge
Design two flows:
- Single-agent (generalist)
- Dual-agent (Planner + Executor)

Describe allowed tools, max steps, and the logging fields you will capture (e.g., tool name, inputs, outputs, error).


In [None]:
import numpy as np

## Provided Tasks (array-based sandbox)

In [None]:
# DO NOT MODIFY
TASKS = [
    {'name':'compute sum','type':'sum','input': np.array([1,2,3])},
    {'name':'compute mean','type':'mean','input': np.array([4,6,10])},
    {'name':'threshold classify','type':'threshold','input': np.array([0.2, 0.9, 0.4]), 'threshold': 0.5},
    {'name':'compute mean','type':'mean','input': np.array([10,0])}
]
ALLOWED_TOOLS = {'sum_array','mean_array','threshold_classify'}
MAX_STEPS = 3

## Tools (NumPy-only)

In [None]:
def sum_array(arr: np.ndarray):
    return float(np.sum(arr))
def mean_array(arr: np.ndarray):
    return float(np.mean(arr))
def threshold_classify(arr: np.ndarray, threshold: float=0.5):
    return (arr >= threshold).astype(int)
TOOLS = {
    'sum_array': sum_array,
    'mean_array': mean_array,
    'threshold_classify': threshold_classify
}

## Deterministic Planner (simulated LLM)
Maps a task dict to a tool call plan (list of steps).

In [None]:
def planner_stub(task):
    t = task.get('type')
    if t == 'sum':
        return [{'tool':'sum_array','args':{'arr': task['input']}}]
    if t == 'mean':
        return [{'tool':'mean_array','args':{'arr': task['input']}}]
    if t == 'threshold':
        return [{'tool':'threshold_classify','args':{'arr': task['input'], 'threshold': task.get('threshold',0.5)}}]
    return []  # unknown task

## Agent Implementations
- Guardrails: allowed tools, max steps, logging
- SingleAgent executes planner_stub internally
- PlannerAgent creates plan; ExecutorAgent executes it

In [None]:
class SingleAgent:
    def __init__(self, allowed_tools, max_steps):
        self.allowed_tools = allowed_tools
        self.max_steps = max_steps
        self.logs = []
    def run(self, task):
        plan = planner_stub(task)
        steps = 0
        last_out = None
        for step in plan:
            steps += 1
            if steps > self.max_steps:
                self.logs.append({'error':'max_steps_exceeded'})
                return None
            tool = step['tool']
            if tool not in self.allowed_tools:
                self.logs.append({'error':'tool_not_allowed', 'tool':tool})
                return None
            fn = TOOLS[tool]
            out = fn(**step['args'])
            self.logs.append({'tool':tool,'args':step['args'],'out':out})
            last_out = out
        return last_out

class PlannerAgent:
    def plan(self, task):
        return planner_stub(task)

class ExecutorAgent:
    def __init__(self, allowed_tools, max_steps):
        self.allowed_tools = allowed_tools
        self.max_steps = max_steps
        self.logs = []
    def execute(self, plan):
        steps = 0
        last_out = None
        for step in plan:
            steps += 1
            if steps > self.max_steps:
                self.logs.append({'error':'max_steps_exceeded'})
                return None
            tool = step['tool']
            if tool not in self.allowed_tools:
                self.logs.append({'error':'tool_not_allowed', 'tool':tool})
                return None
            fn = TOOLS[tool]
            out = fn(**step['args'])
            self.logs.append({'tool':tool,'args':step['args'],'out':out})
            last_out = out
        return last_out

## Evaluation Metrics
We evaluate success on the provided TASKS, track errors, and count steps.

In [None]:
def evaluate_single(agent, tasks):
    successes, errors, steps = 0, {}, 0
    outputs = []
    for t in tasks:
        out = agent.run(t)
        steps += len(planner_stub(t))
        outputs.append(out)
        if out is None:
            for log in agent.logs[::-1]:
                if 'error' in log:
                    errors[log['error']] = errors.get(log['error'],0)+1
                    break
        else:
            successes += 1
    return {'successes':successes,'errors':errors,'steps':steps,'outputs':outputs}

def evaluate_dual(planner, executor, tasks):
    successes, errors, steps = 0, {}, 0
    outputs = []
    for t in tasks:
        plan = planner.plan(t)
        steps += len(plan)
        out = executor.execute(plan)
        outputs.append(out)
        if out is None:
            for log in executor.logs[::-1]:
                if 'error' in log:
                    errors[log['error']] = errors.get(log['error'],0)+1
                    break
        else:
            successes += 1
    return {'successes':successes,'errors':errors,'steps':steps,'outputs':outputs}

In [None]:
# Self-check tests
single = SingleAgent(ALLOWED_TOOLS, MAX_STEPS)
res_single = evaluate_single(single, TASKS)
print('single:', res_single)
planner = PlannerAgent()
executor = ExecutorAgent(ALLOWED_TOOLS, MAX_STEPS)
res_dual = evaluate_dual(planner, executor, TASKS)
print('dual:', res_dual)
# Expect at least 3 successes out of 4 for both modes
assert res_single['successes'] >= 3
assert res_dual['successes'] >= 3
# Outputs sanity
assert isinstance(res_single['outputs'][0], float)
assert isinstance(res_dual['outputs'][1], float)
print('All tests passed.')

## Reflection
Explain the guardrails, logs, and trade-offs observed in the single vs multi-agent setup.