# Task B3 — Minimal AI Agents (NumPy)
Upper Secondary / Section B — Structured Tasks

Constraints: Use Python 3 and NumPy only. No external LLM calls. We simulate planning with a deterministic stub.

Scope:
- Create a minimal LLM-enabled agent with tool use and step-wise planning (simulated).
- Compare single vs multi-agent designs (roles and coordination).
- Evaluate with success rate, error taxonomy, and simple cost/time counts.
- Apply guardrails: permissions, sandboxing, rate limits, and transparent logs.

## Structured Theory Questions
1. Sketch the agent loop (sense → plan → act). What state is tracked?
2. List two risks of autonomous tool-use and one guardrail for each.
3. Give two evaluation metrics suitable for small deterministic tasks.

## Design Challenge
Design two flows:
- Single-agent (generalist)
- Dual-agent (Planner + Executor)

Describe allowed tools, max steps, and the logging fields you will capture (e.g., tool name, inputs, outputs, error).


In [None]:
import numpy as np

## Provided Tasks (array-based sandbox)

In [None]:
# DO NOT MODIFY
TASKS = [
    {'name':'compute sum','type':'sum','input': np.array([1,2,3])},
    {'name':'compute mean','type':'mean','input': np.array([4,6,10])},
    {'name':'threshold classify','type':'threshold','input': np.array([0.2, 0.9, 0.4]), 'threshold': 0.5},
    {'name':'compute mean','type':'mean','input': np.array([10,0])}
]
ALLOWED_TOOLS = {'sum_array','mean_array','threshold_classify'}
MAX_STEPS = 3

## Tools (NumPy-only)

In [None]:
def sum_array(arr: np.ndarray):
    """Return sum(arr) as float.
    Implement using NumPy only.
    """
    pass
def mean_array(arr: np.ndarray):
    """Return mean(arr) as float.
    Implement using NumPy only.
    """
    pass
def threshold_classify(arr: np.ndarray, threshold: float=0.5):
    """Return integer array where entries >= threshold are 1 else 0.
    """
    pass
TOOLS = {
    'sum_array': sum_array,
    'mean_array': mean_array,
    'threshold_classify': threshold_classify
}

## Deterministic Planner (simulated LLM)
Maps a task dict to a tool call plan (list of steps).

In [None]:
def planner_stub(task):
    """Map task dict to a list of tool steps.
    Example step: {'tool': 'sum_array', 'args': {'arr': np.array([...])}}
    """
    pass

## Agent Implementations
- Guardrails: allowed tools, max steps, logging
- SingleAgent executes planner_stub internally
- PlannerAgent creates plan; ExecutorAgent executes it

In [None]:
class SingleAgent:
    def __init__(self, allowed_tools, max_steps):
        """Store allowed tools and max steps; initialise logs.
        """
        pass
    def run(self, task):
        """Execute planner_stub(task) with guardrails (allowed tools, max steps).
        Log each step and return final output.
        """
        pass

class PlannerAgent:
    def plan(self, task):
        """Return planner_stub(task).
        """
        pass

class ExecutorAgent:
    def __init__(self, allowed_tools, max_steps):
        """Store allowed tools and max steps; initialise logs.
        """
        pass
    def execute(self, plan):
        """Execute tool steps with guardrails and logging; return final output.
        """
        pass

## Evaluation Metrics
We evaluate success on the provided TASKS, track errors, and count steps.

In [None]:
def evaluate_single(agent, tasks):
    """Return dict with successes, errors, steps, outputs for single-agent run.
    """
    pass
def evaluate_dual(planner, executor, tasks):
    """Return dict with successes, errors, steps, outputs for dual-agent setup.
    """
    pass

In [None]:
# Suggested self-checks (uncomment after implementation)
# single = SingleAgent(ALLOWED_TOOLS, MAX_STEPS)
# res_single = evaluate_single(single, TASKS)
# planner = PlannerAgent()
# executor = ExecutorAgent(ALLOWED_TOOLS, MAX_STEPS)
# res_dual = evaluate_dual(planner, executor, TASKS)
# print('single:', res_single)
# print('dual:', res_dual)

## Reflection
Explain the guardrails, logs, and trade-offs observed in the single vs multi-agent setup.