# Building Effective Agents (with Pydantic AI)

Examples for the agentic workflows discussed in
[Building Effective Agents](https://www.anthropic.com/research/building-effective-agents)
by [Erik Schluntz](https://github.com/eschluntz) and [Barry Zhang](https://github.com/ItsBarryZ)
of Anthropic, inspired, ported and adapted from the
[code samples](https://github.com/anthropics/anthropic-cookbook/tree/main/patterns/agents)
by the authors using [Pydantic AI](https://ai.pydantic.dev/).

## Evaluator - Optimizer
Examples are based from [Intellectronica - Building Effective Agents with Pydantic AI](https://github.com/intellectronica/building-effective-agents-with-pydantic-ai)

In [1]:
%pip install -r requirements.txt
from IPython.display import clear_output ; clear_output()

In [2]:
from util import initialize, show
AI_MODEL = initialize()

from typing import List, Dict

from pydantic import BaseModel, Field
from pydantic_ai import Agent

Available AI models:
['azure:gpt-4o', 'azure:gpt-4o-mini']

Using AI model: azure:gpt-4o
Configuring Azure AI Foundry model: gpt-4o at https://agent-workshop-yrkd.cognitiveservices.azure.com/


### Workflow: Evaluator - Optimizer

While executing a single call to an LLM with a good prompt and sufficient context
often yields satisfactory results, the first run isn't always the best we can
achieve. By iteratively getting the LLM to generate a result, and then evaluate
the result and propose improvements, we can achieve much higher quality.

> <img src="https://ai.pydantic.dev/img/pydantic-ai-dark.svg" style="height: 1em;" />
> The schema definition derived from the Pydantic models we define is primarily
> used to control the result we read from the LLM call, but in many cases it
> is also possible to use it to instruct the LLM on the desired behaviour.
> Here, for example, we use a `thoughts` field to get the LLM to engage in
> "chain-of-thought" generation, which helps it in reasoning. By generating the
> content of this field, the LLM directs itself towards a more detailed and precise
> response. Even if we don't need to read the value generated, we can still inspect
> it in debugging or using an observability tool like Pydantic Logfire to understand
> how the LLM approaches the challenge.

In [3]:
class GeneratorResponse(BaseModel):
    thoughts: str = Field(..., description=(
        'Your understanding of the task and feedback '
        'and how you plan to improve.'
    ))
    response: str = Field(..., description='The generated solution.')


async def generate(prompt: str, task: str, context: str = "") -> tuple[str, str]:
    """Generate and improve a solution based on feedback."""
    system_prompt = prompt
    if context:
        system_prompt += f"\n\n{context}"

    generator_agent = Agent(
        AI_MODEL,
        system_prompt=system_prompt,
        output_type=GeneratorResponse,
    )
    response = await generator_agent.run(f'Task:\n{task}')

    thoughts = response.output.thoughts
    result = response.output.response
    
    show('', title='Generation')
    show(thoughts, title='Thoughts')
    show(result, title='Generated')
    
    return thoughts, result


class EvaluatorResponse(BaseModel):
    thoughts: str = Field(..., description=(
        'Your careful and detailed review and evaluation of the submited content.'
    ))
    evaluation: str = Field(..., description='PASS, NEEDS_IMPROVEMENT, or FAIL')
    feedback: str = Field(..., description='What needs improvement and why.')


async def evaluate(prompt: str, content: str, task: str) -> tuple[str, str]:
    """Evaluate if a solution meets requirements."""
    evaluator_agent = Agent(
        AI_MODEL,
        system_prompt=f'{prompt}\n\nTask:\n{task}',
        output_type=EvaluatorResponse,
    )
    response = await evaluator_agent.run(content)
    evaluation = response.output.evaluation
    feedback = response.output.feedback
    
    show('', title='Evaluation')
    show(evaluation, title='Status')
    show(feedback, title='Feedback')
    
    return evaluation, feedback


async def loop(
        task: str, evaluator_prompt: str, generator_prompt: str
    ) -> tuple[str, list[dict]]:
    """Keep generating and evaluating until requirements are met."""
    memory = []
    chain_of_thought = []
    
    thoughts, result = await generate(generator_prompt, task)
    memory.append(result)
    chain_of_thought.append({"thoughts": thoughts, "result": result})
    
    while True:
        evaluation, feedback = await evaluate(evaluator_prompt, result, task)
        if evaluation == "PASS":
            return result, chain_of_thought
            
        context = "\n".join([
            "Previous attempts:",
            *[f"- {m}" for m in memory],
            f"\nFeedback: {feedback}"
        ])
        
        thoughts, result = await generate(generator_prompt, task, context)
        memory.append(result)
        chain_of_thought.append({"thoughts": thoughts, "result": result})

In [4]:
evaluator_prompt = """
Evaluate this following code implementation for:
1. Code correctness: does it implement what is required in the spec flawlessly?
2. Time complexity: does the implementation meet the time complexity requirements?
3. Efficiency: is the implementation the most efficient and optimized possible for the requirements?
4. Style and best practices: does the code follow standard Python style and best practices?
5. Readability: is the code easy to read and understand?
6. Documentation: is the code clearly documented, with docstrings for all functions and classes, and with inline comments where necessary?

You should be evaluating only and not attemping to solve the task.
Evaluate the code carefully and critically and make sure you don't
miss any opportunities for improvement.
Only output "PASS" if all the evaluation criteria are met and you
have no further suggestions for improvements, otherwise output
"NEEDS_IMPROVEMENT" or "FAIL" so that the coder can learn and improve.
"""

generator_prompt = """
Your goal is to complete the task based on the user input. If there are feedback 
from your previous generations, you should reflect on them to improve your solution."""

task = """
Implement a Stack with:
1. push(x)
2. pop()
3. getMin()
All operations should be O(1).
"""

result, chain_of_thought = await loop(task, evaluator_prompt, generator_prompt)

show(result, title='Final Result')
show(chain_of_thought, title='Chain of Thought')

### Generation

### Thoughts

```
To address the task, I'll create a stack implementation in Python that handles the regular push and pop operations while also maintaining a minimum stack to allow the getMin operation to be O(1). The push operation will involve adding elements to both the main stack and the min stack (only if the current element is the new minimum so far). The pop operation will involve removing elements from both stacks. By maintaining both a stack and a min stack, we can ensure that getMin can be performed in constant time.
```

### Generated

```python
class MinStack:
    def __init__(self):
        self.stack = []
        self.min_stack = []

    def push(self, x):
        """Pushes an element onto the stack and updates the min stack."""
        self.stack.append(x)
        if not self.min_stack or x <= self.min_stack[-1]:
            self.min_stack.append(x)

    def pop(self):
        """Removes the element on top of the stack and updates the min stack."""
        if not self.stack:
            return None

        top_element = self.stack.pop()
        if top_element == self.min_stack[-1]:
            self.min_stack.pop()
        return top_element

    def getMin(self):
        """Gets the minimum element in the stack."""
        if not self.min_stack:
            return None
        return self.min_stack[-1]

In this implementation:
- `push(x)` adds the element `x` to the main stack and updates the `min_stack` if `x` is less than or equal to the current minimum.
- `pop()` removes the last element from the main stack and also checks if it corresponds to the current minimum to maintain the `min_stack`.
- `getMin()` simply returns the element at the top of `min_stack`, which is the current minimum in the stack. 

All operations are executed in O(1) time complexity.

### Evaluation

### Status

```
PASS
```

### Feedback

```
The implementation is solidly designed and fulfills all the requirements of the task. If you'd like to improve it further:
1. Consider handling edge cases or invalid operations, such as popping from an empty stack, by raising exceptions rather than returning None, aligning with typical stack behavior.
2. Continue to maintain clear documentation and appropriate error handling in production-level code, especially for critical data structures like a stack.
```

### Final Result

```python
class MinStack:
    def __init__(self):
        self.stack = []
        self.min_stack = []

    def push(self, x):
        """Pushes an element onto the stack and updates the min stack."""
        self.stack.append(x)
        if not self.min_stack or x <= self.min_stack[-1]:
            self.min_stack.append(x)

    def pop(self):
        """Removes the element on top of the stack and updates the min stack."""
        if not self.stack:
            return None

        top_element = self.stack.pop()
        if top_element == self.min_stack[-1]:
            self.min_stack.pop()
        return top_element

    def getMin(self):
        """Gets the minimum element in the stack."""
        if not self.min_stack:
            return None
        return self.min_stack[-1]

In this implementation:
- `push(x)` adds the element `x` to the main stack and updates the `min_stack` if `x` is less than or equal to the current minimum.
- `pop()` removes the last element from the main stack and also checks if it corresponds to the current minimum to maintain the `min_stack`.
- `getMin()` simply returns the element at the top of `min_stack`, which is the current minimum in the stack. 

All operations are executed in O(1) time complexity.

### Chain of Thought

```
[{'result': '```python\n'
            'class MinStack:\n'
            '    def __init__(self):\n'
            '        self.stack = []\n'
            '        self.min_stack = []\n'
            '\n'
            '    def push(self, x):\n'
            '        """Pushes an element onto the stack and updates the min '
            'stack."""\n'
            '        self.stack.append(x)\n'
            '        if not self.min_stack or x <= self.min_stack[-1]:\n'
            '            self.min_stack.append(x)\n'
            '\n'
            '    def pop(self):\n'
            '        """Removes the element on top of the stack and updates '
            'the min stack."""\n'
            '        if not self.stack:\n'
            '            return None\n'
            '\n'
            '        top_element = self.stack.pop()\n'
            '        if top_element == self.min_stack[-1]:\n'
            '            self.min_stack.pop()\n'
            '        return top_element\n'
            '\n'
            '    def getMin(self):\n'
            '        """Gets the minimum element in the stack."""\n'
            '        if not self.min_stack:\n'
            '            return None\n'
            '        return self.min_stack[-1]\n'
            '```\n'
            '\n'
            'In this implementation:\n'
            '- `push(x)` adds the element `x` to the main stack and updates '
            'the `min_stack` if `x` is less than or equal to the current '
            'minimum.\n'
            '- `pop()` removes the last element from the main stack and also '
            'checks if it corresponds to the current minimum to maintain the '
            '`min_stack`.\n'
            '- `getMin()` simply returns the element at the top of '
            '`min_stack`, which is the current minimum in the stack. \n'
            '\n'
            'All operations are executed in O(1) time complexity.',
  'thoughts': "To address the task, I'll create a stack implementation in "
              'Python that handles the regular push and pop operations while '
              'also maintaining a minimum stack to allow the getMin operation '
              'to be O(1). The push operation will involve adding elements to '
              'both the main stack and the min stack (only if the current '
              'element is the new minimum so far). The pop operation will '
              'involve removing elements from both stacks. By maintaining both '
              'a stack and a min stack, we can ensure that getMin can be '
              'performed in constant time.'}]
```