Authored by: Aryan Mistry

# Capstone: Build Your Own Agents

Now that you've seen how tools, simple agents, and workflows work, it's time to design your own agents. An agent is a modular component that can handle certain queries; by composing many agents you can build a robust system that delegates tasks to specialised modules. In this lab you'll implement a small collection of agents and a dispatcher to route user questions to the appropriate agent. [19]

## 1. Agent Interface

All agents will inherit from a `BaseAgent` abstract class. Each agent must
implement two methods:

- `can_handle(query: str) -> bool` – return `True` if the agent knows how
  to answer the query.
- `handle(query: str) -> str` – return a response for the query. This could
  call tools, look up data, or return an error message.

Using a common interface makes it easy to register multiple agents and
delegate queries to the appropriate one.


In [None]:

from abc import ABC, abstractmethod

class BaseAgent(ABC):
    """Abstract base class for agents.

    Subclasses must implement `can_handle` to declare if they can process a query, and `handle` to return a response.
    """
    name: str
    @abstractmethod
    def can_handle(self, query: str) -> bool:
        pass
    @abstractmethod
    def handle(self, query: str) -> str:
        pass


## 2. Example Agents

We'll implement four example agents:

- **ArithmeticAgent** – computes simple arithmetic expressions using the
  `safe_eval` helper. It triggers on queries containing arithmetic operators
  or words like "plus" and returns the numeric result.
- **WeatherAgent** – looks up a fake temperature in a predefined database.
  It triggers on queries mentioning weather or temperature.
- **DictionaryAgent** – returns definitions of words from a small built-in
  dictionary. It triggers on queries beginning with "define".
- **FactAgent** – answers simple factual questions from a predefined fact
  dictionary (e.g. capitals, basic science facts). It triggers when its
  keywords appear in the query.


In [None]:

import re
import ast

# Reuse safe_eval from previous notebooks

def safe_eval(expr: str) -> float:
    parsed = ast.parse(expr, mode='eval')
    def eval_node(node):
        if isinstance(node, ast.Expression):
            return eval_node(node.body)
        elif isinstance(node, ast.BinOp):
            left = eval_node(node.left)
            right = eval_node(node.right)
            if isinstance(node.op, ast.Add):
                return left + right
            elif isinstance(node.op, ast.Sub):
                return left - right
            elif isinstance(node.op, ast.Mult):
                return left * right
            elif isinstance(node.op, ast.Div):
                return left / right
            else:
                raise ValueError("Unsupported operator")
        elif isinstance(node, ast.UnaryOp):
            return eval_node(node.operand) if isinstance(node.op, ast.UAdd) else -eval_node(node.operand)
        elif isinstance(node, ast.Constant):
            return node.value
        else:
            raise ValueError("Unsupported expression component")
    return eval_node(parsed)

class ArithmeticAgent(BaseAgent):
    """Agent that evaluates arithmetic expressions.

    Triggers on queries containing numbers and arithmetic operators.
    """
    name = "arithmetic"
    def can_handle(self, query: str) -> bool:
        q = query.lower()
        return any(op in q for op in ['+', '-', '*', '/', 'plus', 'minus', 'times', 'divided'])
    def handle(self, query: str) -> str:
        expr = ''.join(ch for ch in query if ch.isdigit() or ch in '+-*/(). ')
        try:
            result = safe_eval(expr)
            return f"The result is {result}."
        except Exception as e:
            return f"I couldn't compute that expression: {e}"

class WeatherAgent(BaseAgent):
    """Agent that returns mock weather reports.

    Recognises queries containing the word 'weather'.
    """
    name = "weather"
    def __init__(self):
        self.database = {'London': 18.5, 'New York': 22.0, 'Paris': 20.0, 'Tokyo': 25.0}
    def can_handle(self, query: str) -> bool:
        q = query.lower()
        return 'weather' in q or 'temperature' in q
    def handle(self, query: str) -> str:
        city = query.split()[-1].title()
        if city not in self.database:
            return f"Sorry, I don't have weather data for {city}."
        return f"The temperature in {city} is {self.database[city]}°C."

class DictionaryAgent(BaseAgent):
    """Agent that provides simple definitions for words.

    It looks up a word in a built‑in dictionary.
    """
    name = "dictionary"
    def __init__(self):
        self.definitions = {
            'photosynthesis': 'the process by which green plants convert sunlight into chemical energy',
            'ai': 'the field of study devoted to creating machines capable of intelligent behaviour',
            'economics': 'the social science that studies the production, distribution, and consumption of goods and services'
        }
    def can_handle(self, query: str) -> bool:
        return query.lower().startswith('define ')
    def handle(self, query: str) -> str:
        word = query.split(maxsplit=1)[-1].lower()
        definition = self.definitions.get(word)
        if definition:
            return f"{word.title()}: {definition}."
        return f"Sorry, I don't have a definition for {word}."

class FactAgent(BaseAgent):
    """Agent that fetches factual information from a small knowledge base.

    For example, it can answer 'Who wrote Hamlet?'.
    """
    name = "fact"
    def __init__(self):
        self.facts = {
            'capital of france': 'Paris is the capital of France.',
            'first man on the moon': 'Neil Armstrong was the first person to walk on the Moon.',
            'speed of light': 'The speed of light in vacuum is approximately 299,792 km per second.'
        }
    def can_handle(self, query: str) -> bool:
        q = query.lower()
        return any(key in q for key in self.facts)
    def handle(self, query: str) -> str:
        q = query.lower()
        for key, fact in self.facts.items():
            if key in q:
                return fact
        return "I'm sorry, I don't know the answer to that question."


## 3. Registering and Dispatching

We'll register our agents in a list and write a dispatcher that iterates
through them to find one that can handle a given query. If no agent can
respond, the dispatcher returns a default message. The order of agents in
`agents` matters: earlier agents have higher priority.


In [None]:

aut = ArithmeticAgent()
wea = WeatherAgent()
dict_agent = DictionaryAgent()
fact_agent = FactAgent()

agents = [aut, wea, dict_agent, fact_agent]

def dispatch(query: str) -> str:
    for agent in agents:
        if agent.can_handle(query):
            return agent.handle(query)
    return "I'm not sure how to answer that."

queries = [
    'What is 7 * 6?',
    'What is the temperature Tokyo?',
    'Define photosynthesis',
    'Who was the first man on the moon?',
    'How do airplanes fly?'
]
for q in queries:
    print(f"Q: {q}A: {dispatch(q)}")


Q: What is 7 * 6?A: I couldn't compute that expression: unexpected indent (<unknown>, line 1)
Q: What is the temperature Tokyo?A: Sorry, I don't have weather data for Tokyo?.
Q: Define photosynthesisA: Photosynthesis: the process by which green plants convert sunlight into chemical energy.
Q: Who was the first man on the moon?A: Neil Armstrong was the first person to walk on the Moon.
Q: How do airplanes fly?A: I'm not sure how to answer that.


## 4. Exercises

1. **Create a conversion agent.** Write an agent that converts units (e.g. metres to feet, Celsius to Fahrenheit). Use a simple conversion dictionary and update the dispatcher accordingly.
2. **Fallback agent.** Implement a `FallbackAgent` that always returns a polite apology and maybe suggests what kinds of questions it can answer.
3. **Parallel calls.** For queries containing multiple tasks (e.g. "Add 3 to the temperature in Paris and tell me the capital of France"), modify the dispatcher so that it calls multiple agents and combines their responses.
4. **Persistent knowledge.** Extend one of your agents to load data from a file. For example, load a larger dictionary from a CSV or JSON file located in `/home/oai/share`.
5. **Testing.** Write a suite of unit tests (using Python's `unittest` framework) to verify that each agent behaves as expected. Run your tests in a separate cell.


Foundational LLMs & Transformers
1. Vaswani, A., et al. (2017). Attention is All You Need. Advances in Neural Information Processing Systems (NIPS 2017).
2. Brown, T. B., et al. (2020). Language Models are Few-Shot Learners. NeurIPS 2020.
3. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT 2019.
4. OpenAI (2023). GPT-4 Technical Report. arXiv:2303.08774.
5. Touvron, H., et al. (2023). LLaMA 2: Open Foundation and Fine-Tuned Chat Models. Meta AI.

Generative AI & Sampling

6. Goodfellow, I., et al. (2014). Generative Adversarial Nets. NeurIPS 2014.
7. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
8. Neal, R. M. (1993). Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1, University of Toronto.

Retrieval-Augmented Generation (RAG) & Knowledge Grounding

9. Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP. NeurIPS 2020.
10. deepset ai (2023). Haystack: Open-Source Framework for Search and RAG Applications. https://haystack.deepset.ai
11. LangChain (2023). LangChain Documentation and Cookbook. https://python.langchain.com

Evaluation & Safety

12. Papineni, K., et al. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. ACL 2002.
13. Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. ACL Workshop 2004.
14. OpenAI (2024). Evaluating Model Outputs: Faithfulness and Grounding. OpenAI Docs.
15. Guardrails AI (2024). Open-Source Guardrails Framework. https://github.com/shreyar/guardrails

Prompt Engineering & Instruction Tuning

16. White, J. (2023). The Prompting Guide. https://www.promptingguide.ai
17. Ouyang, L., et al. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS 2022.

Agents & Tool Use

18. Yao, S., et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629.
19. LangChain (2024). LangChain Agents and Tools Documentation.
20. Microsoft (2023). Semantic Kernel Developer Guide. https://learn.microsoft.com/en-us/semantic-kernel/
21. Google DeepMind (2024). Gemini Technical Report. arXiv:2312.11805.

State, Memory & Orchestration

22. LangGraph (2024). Stateful Agent Orchestration Framework. https://langchain-langgraph.vercel.app
23. Park, J. S., et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442.

Pedagogical and Course Design References

24. fast.ai (2023). fast.ai Deep Learning Course Notebooks. https://course.fast.ai
25. Ng, A. (2023). DeepLearning.AI Short Courses on Generative AI.
26. MIT 6.S191, Stanford CS324, UC Berkeley CS294-158. (2022–2024). Course Materials and Public Notebooks for ML and LLMs.