# Building Effective Agents (with Pydantic AI)

Examples for the agentic workflows discussed in
[Building Effective Agents](https://www.anthropic.com/research/building-effective-agents)
by [Erik Schluntz](https://github.com/eschluntz) and [Barry Zhang](https://github.com/ItsBarryZ)
of Anthropic, inspired, ported and adapted from the
[code samples](https://github.com/anthropics/anthropic-cookbook/tree/main/patterns/agents)
by the authors using [Pydantic AI](https://ai.pydantic.dev/).

## Evaluator - Optimizer
Examples copied from [Intellectronica - Building Effective Agents with Pydantic AI](https://github.com/intellectronica/building-effective-agents-with-pydantic-ai)

In [6]:
%pip install -r requirements.txt
from IPython.display import clear_output ; clear_output()

In [7]:
from util import initialize, show
AI_MODEL = initialize()

from typing import List, Dict

from pydantic import BaseModel, Field
from pydantic_ai import Agent

Available AI models:
['openai:gpt-4o', 'openai:gpt-4o-mini']

Using AI model: openai:gpt-4o


### Workflow: Evaluator - Optimizer

While executing a single call to an LLM with a good prompt and sufficient context
often yields satisfactory results, the first run isn't always the best we can
achieve. By iteratively getting the LLM to generate a result, and then evaluate
the result and propose improvements, we can achieve much higher quality.

> <img src="https://ai.pydantic.dev/img/pydantic-ai-dark.svg" style="height: 1em;" />
> The schema definition derived from the Pydantic models we define is primarily
> used to control the result we read from the LLM call, but in many cases it
> is also possible to use it to instruct the LLM on the desired behaviour.
> Here, for example, we use a `thoughts` field to get the LLM to engage in
> "chain-of-thought" generation, which helps it in reasoning. By generating the
> content of this field, the LLM directs itself towards a more detailed and precise
> response. Even if we don't need to read the value generated, we can still inspect
> it in debugging or using an observability tool like Pydantic Logfire to understand
> how the LLM approaches the challenge.

In [12]:
class GeneratorResponse(BaseModel):
    thoughts: str = Field(..., description=(
        'Your understanding of the task and feedback '
        'and how you plan to improve.'
    ))
    response: str = Field(..., description='The generated solution.')


async def generate(prompt: str, task: str, context: str = "") -> tuple[str, str]:
    """Generate and improve a solution based on feedback."""
    system_prompt = prompt
    if context:
        system_prompt += f"\n\n{context}"

    generator_agent = Agent(
        AI_MODEL,
        system_prompt=system_prompt,
        output_type=GeneratorResponse,
    )
    response = await generator_agent.run(f'Task:\n{task}')

    thoughts = response.output.thoughts
    result = response.output.response
    
    show('', title='Generation')
    show(thoughts, title='Thoughts')
    show(result, title='Generated')
    
    return thoughts, result


class EvaluatorResponse(BaseModel):
    thoughts: str = Field(..., description=(
        'Your careful and detailed review and evaluation of the submited content.'
    ))
    evaluation: str = Field(..., description='PASS, NEEDS_IMPROVEMENT, or FAIL')
    feedback: str = Field(..., description='What needs improvement and why.')


async def evaluate(prompt: str, content: str, task: str) -> tuple[str, str]:
    """Evaluate if a solution meets requirements."""
    evaluator_agent = Agent(
        AI_MODEL,
        system_prompt=f'{prompt}\n\nTask:\n{task}',
        output_type=EvaluatorResponse,
    )
    response = await evaluator_agent.run(content)
    evaluation = response.output.evaluation
    feedback = response.output.feedback
    
    show('', title='Evaluation')
    show(evaluation, title='Status')
    show(feedback, title='Feedback')
    
    return evaluation, feedback


async def loop(
        task: str, evaluator_prompt: str, generator_prompt: str
    ) -> tuple[str, list[dict]]:
    """Keep generating and evaluating until requirements are met."""
    memory = []
    chain_of_thought = []
    
    thoughts, result = await generate(generator_prompt, task)
    memory.append(result)
    chain_of_thought.append({"thoughts": thoughts, "result": result})
    
    while True:
        evaluation, feedback = await evaluate(evaluator_prompt, result, task)
        if evaluation == "PASS":
            return result, chain_of_thought
            
        context = "\n".join([
            "Previous attempts:",
            *[f"- {m}" for m in memory],
            f"\nFeedback: {feedback}"
        ])
        
        thoughts, result = await generate(generator_prompt, task, context)
        memory.append(result)
        chain_of_thought.append({"thoughts": thoughts, "result": result})

In [11]:
evaluator_prompt = """
Evaluate this following code implementation for:
1. Code correctness: does it implement what is required in the spec flawlessly?
2. Time complexity: does the implementation meet the time complexity requirements?
3. Efficiency: is the implementation the most efficient and optimized possible for the requirements?
4. Style and best practices: does the code follow standard Python style and best practices?
5. Readability: is the code easy to read and understand?
6. Documentation: is the code clearly documented, with docstrings for all functions and classes, and with inline comments where necessary?

You should be evaluating only and not attemping to solve the task.
Evaluate the code carefully and critically and make sure you don't
miss any opportunities for improvement.
Only output "PASS" if all the evaluation criteria are met and you
have no further suggestions for improvements, otherwise output
"NEEDS_IMPROVEMENT" or "FAIL" so that the coder can learn and improve.
"""

generator_prompt = """
Your goal is to complete the task based on the user input. If there are feedback 
from your previous generations, you should reflect on them to improve your solution."""

task = """
Implement a Stack with:
1. push(x)
2. pop()
3. getMin()
All operations should be O(1).
"""

result, chain_of_thought = await loop(task, evaluator_prompt, generator_prompt)

show(result, title='Final Result')
show(chain_of_thought, title='Chain of Thought')


Generation
----------


Thoughts
--------

The task requires implementing a stack that supports three operations: push, pop, and getMin, all in constant O(1) time complexity. To improve from previous methods and feedback, I ensure that my approach can provide constant time complexity for all operations. The approach involves using two stacks: one for the stack itself and another for keeping track of minimum values. This dual-stack approach allows updates to the minimum in constant time when elements are pushed and popped. This method has been well tested and is efficient.


Generated
---------

class StackWithMin:
    def __init__(self):
        self.stack = []
        self.min_stack = []

    def push(self, x):
        self.stack.append(x)
        # If the min_stack is empty or the new element is smaller or equal,
        # push it onto the min_stack
        if not self.min_stack or x <= self.min_stack[-1]:
            self.min_stack.append(x)

    def pop(self):
        if not self.