# Basic Evals

Evals turn "this agent seems to work" into repeatable, versioned evidence about correctness.
This notebook demonstrates three evaluation approaches: exact matching, structured outputs, and LLM-as-a-Judge.

In [None]:
from agentic_patterns.core.agents import get_agent, run_agent

## String Matching Eval

The simplest eval: check if the response contains the expected answer.
Works for factual questions with known answers.

In [None]:
async def test_simple_eval():
    agent = get_agent()
    prompt = "What's the capital of Nepal?"
    agent_run, _ = await run_agent(agent, prompt)
    answer = agent_run.result.output
    print(f"Agent's answer: {answer}")
    # Check if the response contains the expected answer
    assert 'kathmandu' in answer.lower()

In [None]:
await test_simple_eval()

## Structured Output Eval

Structured outputs (bool, int, enum) make assertions trivial.
The model returns a typed value instead of free-form text.

In [None]:
async def test_simple_eval_structured():
    agent = get_agent(output_type=bool)
    prompt = "Is the capital of Nepal Kathmandu?"
    agent_run, _ = await run_agent(agent, prompt)
    answer = agent_run.result.output
    print(f"Agent's answer: {answer}")
    assert answer is True

In [None]:
await test_simple_eval_structured()

## LLM-as-a-Judge

For open-ended questions without exact answers, use another LLM to evaluate.
The judge assesses whether the answer correctly addresses the question.

In [None]:
async def judge(question: str, answer: str) -> None:
    """Judge if the answer is correct for the given question."""
    judge_agent = get_agent(output_type=bool)
    judge_prompt = f"""
# Judge

Given a question, judge if the following answer is correct.

## Question

{question}

## Answer

{answer}
"""
    agent_run, _ = await run_agent(judge_agent, judge_prompt)
    is_correct = agent_run.result.output
    print(f"Judge verdict: {is_correct}")
    assert is_correct is True

In [None]:
async def test_llm_as_a_judge():
    agent = get_agent()
    question = "Why is the sky blue?"
    agent_run, _ = await run_agent(agent, question)
    answer = agent_run.result.output
    print(f"Agent's answer: {answer}")
    await judge(question, answer)

In [None]:
await test_llm_as_a_judge()