# L5 â€” Simple Evaluation (LLM-as-judge + basic checks)

# Setup

This notebook uses **OpenAI (Python SDK v2) + LangChain v1**.

## Prereqs
1. Set your API key in the environment:

```bash
export OPENAI_API_KEY="..."
```

2. Restart the kernel after setting env vars.


In [1]:
import os

# Make sure your key is set
assert os.getenv("OPENAI_API_KEY"), "Set OPENAI_API_KEY in your environment before running."

MODEL = "gpt-5-mini"


We'll evaluate whether an answer is grounded in retrieved context using a small rubric.

In [2]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

llm = ChatOpenAI(model=MODEL)

class Groundedness(BaseModel):
    grounded: bool = Field(..., description="True if the answer is supported by the context.")
    rationale: str = Field(..., description="1-3 sentences why.")

parser = PydanticOutputParser(pydantic_object=Groundedness)

judge_prompt = ChatPromptTemplate.from_messages([
    ("system",
     "You are a strict evaluator. "
     "Given CONTEXT and ANSWER, decide if the answer is supported. "
     "Return JSON only.\n{format_instructions}"),
    ("user", "CONTEXT:\n{context}\n\nANSWER:\n{answer}")
]).partial(format_instructions=parser.get_format_instructions())

judge = judge_prompt | llm | parser

context = """Product: RainShell Pro
Description: Fully waterproof, 3-layer shell jacket. Sealed seams. Breathable membrane.
"""

good_answer = "RainShell Pro is fully waterproof with sealed seams and a breathable 3-layer membrane."
bad_answer = "RainShell Pro is a cotton hoodie designed for hot weather."

print("Good:", judge.invoke({"context": context, "answer": good_answer}))
print("Bad:", judge.invoke({"context": context, "answer": bad_answer}))


Good: grounded=True rationale='The answer restates details given in the context: the product is described as fully waterproof, has sealed seams, and uses a breathable 3-layer membrane. Therefore the claim is supported by the provided description.'
Bad: grounded=False rationale='The context describes RainShell Pro as a fully waterproof, 3-layer shell jacket with sealed seams and a breathable membrane. That contradicts the answer calling it a cotton hoodie for hot weather, so the answer is not supported.'
