# Workout: Structured Outputs

## Setup
```bash
uv add openai pydantic instructor tenacity
```

---
## Drill 1: Basic JSON Mode 游릭
**Task:** Use JSON mode to extract name and age

In [None]:
from openai import OpenAI
import json

client = OpenAI()

# Use response_format to force JSON
# Extract from: "Alice is 30 years old and works as an engineer"

response = client.chat.completions.create(
    model="gpt-4o",
    # Add JSON mode here
    messages=[
        # Remember: must mention JSON in prompt!
    ]
)

# Parse and print

---
## Drill 2: Pydantic Validation 游리
**Task:** Create a Pydantic model and validate LLM output

In [None]:
from pydantic import BaseModel, Field
import json

class Product(BaseModel):
    name: str = Field(min_length=1)
    price: float = Field(gt=0)
    category: str

# Test with these LLM outputs - which pass validation?
outputs = [
    '{"name": "iPhone", "price": 999, "category": "phone"}',
    '{"name": "", "price": 999, "category": "phone"}',
    '{"name": "iPhone", "price": -10, "category": "phone"}',
    '{"name": "iPhone", "price": 999}',  # Missing category
]

for output in outputs:
    # Try to validate and print result
    pass

---
## Drill 3: Handle Parsing Errors 游리
**Task:** Create a safe extraction function

In [None]:
from pydantic import BaseModel, ValidationError
import json
from typing import TypeVar, Type

T = TypeVar("T", bound=BaseModel)

def safe_extract(response: str, model: Type[T]) -> T | None:
    """
    Safely extract and validate LLM response.
    Return None if JSON parsing or validation fails.
    Log the error.
    """
    pass

# Test
class User(BaseModel):
    name: str
    age: int

result = safe_extract('{"name": "Alice", "age": 30}', User)
result = safe_extract('invalid json', User)
result = safe_extract('{"name": "Alice"}', User)  # Missing age

---
## Drill 4: Instructor Basic 游릭
**Task:** Use Instructor for automatic structured output

In [None]:
import instructor
from openai import OpenAI
from pydantic import BaseModel

# Patch the client
client = instructor.from_openai(OpenAI())

class MovieReview(BaseModel):
    title: str
    rating: int  # 1-10
    summary: str

# Extract from natural text
text = """
I just watched Inception and it blew my mind! Christopher Nolan
really outdid himself. I'd give it a solid 9 out of 10. The dream
within a dream concept was fascinating.
"""

# Use response_model parameter

---
## Drill 5: Complex Nested Schema 游댮
**Task:** Define and extract a complex nested structure

In [None]:
from pydantic import BaseModel, Field
from typing import Literal
from enum import Enum

# Define schema for extracting a meeting summary:
# - title: str
# - date: str
# - attendees: list of {name: str, role: str}
# - action_items: list of {task: str, assignee: str, priority: high/medium/low}
# - next_steps: str

class Attendee(BaseModel):
    pass

class ActionItem(BaseModel):
    pass

class MeetingSummary(BaseModel):
    pass

# Test extraction on meeting notes
meeting_notes = """
Project Kickoff Meeting - January 15, 2024

Attendees:
- Sarah (Project Manager)
- John (Developer)
- Lisa (Designer)

Action Items:
1. John to set up development environment - High priority
2. Lisa to create mockups - Medium priority
3. Sarah to schedule weekly syncs - Low priority

Next steps: Reconvene next Monday to review progress.
"""

---
## Drill 6: Retry with Error Feedback 游댮
**Task:** Implement retry that feeds errors back to LLM

In [None]:
from openai import OpenAI
from pydantic import BaseModel, ValidationError
import json

client = OpenAI()

class StrictUser(BaseModel):
    name: str
    age: int
    email: str  # Required!

def extract_with_retry(text: str, max_retries: int = 3) -> StrictUser | None:
    """
    Extract user info with retry.
    On validation failure, send the error back to LLM.
    """
    messages = [
        {"role": "system", "content": "Extract user info as JSON with name, age, email."},
        {"role": "user", "content": text}
    ]

    # Implement retry loop with error feedback
    pass

# Test - this text is missing email, LLM should infer or ask
result = extract_with_retry("Alice is 30 years old")

---
## Drill 7: Function Calling 游리
**Task:** Use OpenAI function calling for structured extraction

In [None]:
from openai import OpenAI
import json

client = OpenAI()

# Define tool for extracting product info
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_product",
            "description": "Extract product information",
            "parameters": {
                "type": "object",
                "properties": {
                    # Define: name, price, category, in_stock (bool)
                },
                "required": ["name", "price"]
            }
        }
    }
]

# Extract from: "The new MacBook Pro is $2499, it's a laptop and currently available"

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}],
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "extract_product"}}
)

# Parse tool call

---
## Drill 8: Instructor with Retries 游리
**Task:** Configure Instructor with custom retry settings

In [None]:
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field

client = instructor.from_openai(OpenAI())

class StrictNumber(BaseModel):
    value: int = Field(ge=1, le=100, description="A number between 1 and 100")
    explanation: str = Field(max_length=50)

# Extract with max_retries
# Prompt that might fail initially: "give me one hundred and fifty"

result = client.chat.completions.create(
    model="gpt-4o",
    response_model=StrictNumber,
    max_retries=3,
    messages=[
        {"role": "user", "content": "Give me the number one hundred and fifty"}
    ]
)

---
## Drill 9: Batch Extraction 游댮
**Task:** Extract multiple items from a single response

In [None]:
from pydantic import BaseModel
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

class Person(BaseModel):
    name: str
    role: str

class TeamExtraction(BaseModel):
    team_name: str
    members: list[Person]

# Extract from this text
text = """
The Avengers team consists of:
- Tony Stark (Iron Man)
- Steve Rogers (Captain America)
- Natasha Romanoff (Black Widow)
- Bruce Banner (Hulk)
"""

# Should get team_name="Avengers" and 4 members

---
## Drill 10: Extraction Pipeline 游댮
**Task:** Build a complete extraction pipeline class

In [None]:
from dataclasses import dataclass
from typing import TypeVar, Type, Generic
from pydantic import BaseModel
import instructor
from openai import OpenAI

T = TypeVar("T", bound=BaseModel)

@dataclass
class ExtractionResult(Generic[T]):
    success: bool
    data: T | None
    error: str | None
    tokens_used: int

class Extractor:
    def __init__(self, model: str = "gpt-4o"):
        self.client = instructor.from_openai(OpenAI())
        self.model = model
        self.total_tokens = 0

    def extract(
        self,
        text: str,
        schema: Type[T],
        context: str | None = None
    ) -> ExtractionResult[T]:
        """
        Extract structured data from text.
        Track token usage.
        Handle errors gracefully.
        """
        pass

    @property
    def total_cost(self) -> float:
        """Estimate total cost based on tokens used."""
        pass

# Test
# extractor = Extractor()

class Invoice(BaseModel):
    vendor: str
    amount: float
    date: str

# result = extractor.extract(
#     "Invoice from Acme Corp for $500 dated Jan 15 2024",
#     Invoice
# )
# print(result.data)
# print(f"Tokens: {result.tokens_used}")

---
## Self-Check

- [ ] Can use JSON mode correctly (remember to mention JSON in prompt)
- [ ] Can define Pydantic models for extraction
- [ ] Can handle validation errors gracefully
- [ ] Can use Instructor for automatic structured outputs
- [ ] Can implement retry logic with error feedback
- [ ] Can use function calling as an alternative