# Foundations Course — Week 3 Practice (Starter Notebook)

---

## Pre-study (Self-learn)

Foundations Course assumes Self-learn is complete. If you need a refresher on prompt engineering fundamentals and structured outputs:

- [Foundations Course Pre-study index](../PRESTUDY.md)
- [Self-learn — Prompt engineering and evaluation](../self_learn/Chapters/3/02_prompt_engineering_evaluation.md)
- [Self-learn — Structured outputs and schemas](../self_learn/Chapters/3/01_function_calling_structured_outputs.md)

---

Starter code for structured outputs: JSON parsing + schema validation + retry/repair patterns.

## What success looks like (end of practice)

- You can turn raw model text into a validated object (or a clear failure).
- You can retry/repair a failure with a hard cap.
- You saved at least one raw failure output under `output/` for inspection.

### Checkpoint

- `parse_validate_with_retry(...)` succeeds for at least one bad input.
- You can point to an output file under `output/` that captures a failed raw string.

## References (docs)
- JSON Schema (official): https://json-schema.org/
- Python `json` (official): https://docs.python.org/3/library/json.html
- Pydantic (validation): https://docs.pydantic.dev/latest/
- Tenacity (retries): https://tenacity.readthedocs.io/
- Prompt Engineering Guide (community): https://www.promptingguide.ai/
- Anthropic Cookbook (GitHub): https://github.com/anthropics/anthropic-cookbook

## What is a Prompt?

A **prompt** is the input text or instructions you send to a Large Language Model (LLM). It acts as the API contract between your code and the AI.

Typically, when using an LLM API (like OpenAI's), a prompt is broken down into structured roles:
- **System**: High-level instructions, persona, and rules (e.g., "You are a helpful Python expert. Always return JSON").
- **User**: The specific request, task, or data the user wants processed.

Let's look at a basic example of how to construct and send a prompt using the OpenAI API format.

In [None]:
import os
import json
from openai import OpenAI
from dotenv import load_dotenv

# Make sure you have a .env file with your OPENAI_API_KEY
load_dotenv()

# We initialize the client
client = OpenAI()

def call_llm(system_prompt: str, user_prompt: str) -> str:
    """
    A basic wrapper around an LLM API call demonstrating 'What is a Prompt?'.
    We separate the 'System' (rules/persona) from the 'User' (task/data).
    """
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.0 # Keep it deterministic
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error calling API: {e}"

# Example of a System Prompt (the "Contract")
system_prompt = """
You are a helpful Python expert. 
Your goal is to explain concepts clearly to beginners.
"""

# Example of a User Prompt (the "Input")
user_prompt = "Explain what a Python dictionary is in one sentence."

# Let's run it
output = call_llm(system_prompt, user_prompt)
print("=== LLM Response ===")
print(output)

## Setup

Run this in an environment with `pydantic` and `tenacity` installed.


In [None]:
import json
from typing import List, Optional

from pydantic import BaseModel


## Define a target schema

This schema defines what downstream code can rely on.


In [None]:
class ExtractionItem(BaseModel):
    field: str
    value: str

class ExtractionResult(BaseModel):
    items: List[ExtractionItem]
    notes: Optional[str] = None

schema_json = ExtractionResult.model_json_schema()
schema_json


## Simulate model output

We simulate common failure cases: invalid JSON, and valid JSON with wrong shape.


In [None]:
raw_good = json.dumps({
    'items': [{'field': 'company', 'value': 'Acme'}],
    'notes': 'ok',
}, ensure_ascii=False)
raw_bad_json = 'items: [company=Acme]'
raw_wrong_shape = json.dumps({'items': [{'field': 'company'}]}, ensure_ascii=False)
raw_good, raw_bad_json, raw_wrong_shape


## Parse + validate helper

JSON parsing + schema validation turns model output into an explicit success/failure.


In [None]:
def parse_and_validate(raw_text: str) -> ExtractionResult:
    data = json.loads(raw_text)
    return ExtractionResult.model_validate(data)


In [None]:
parse_and_validate(raw_good)


## Retry/repair wrapper (starter pattern)

In production you might re-prompt the model using the schema and the invalid output.


In [None]:
def naive_repair(raw_text: str) -> str:
    # TODO: replace with an LLM re-ask in the real project
    if raw_text.startswith('items:'):
        return json.dumps({
            'items': [{'field': 'company', 'value': 'Acme'}],
            'notes': 'repaired',
        }, ensure_ascii=False)
    return raw_text

def parse_validate_with_retry(raw_text: str) -> ExtractionResult:
    repaired = naive_repair(raw_text)
    return parse_and_validate(repaired)

parse_validate_with_retry(raw_bad_json)


## Exercise: persist raw failures (TODO)

Implement the TODO function below.

Goal:

- When parsing/validation fails, persist the raw output under `output/`.
- Return the output path so you can reference it in a report/debugging.

Checkpoint:

- Running the cell creates a file like `output/raw_failure.txt`.

In [None]:
from pathlib import Path


OUTPUT_DIR = Path("output")
OUTPUT_DIR.mkdir(exist_ok=True)


def persist_raw_failure_todo(raw_text: str, *, filename: str = "raw_failure.txt") -> Path:
    # TODO: implement
    out_path = OUTPUT_DIR / filename
    out_path.write_text("TODO\n", encoding="utf-8")
    return out_path


for raw in [raw_bad_json, raw_wrong_shape]:
    try:
        parse_and_validate(raw)
    except Exception:
        p = persist_raw_failure_todo(raw)
        print("saved raw failure to", p)

## Appendix: Solutions (peek only after trying)

Reference implementation for `persist_raw_failure_todo`.

In [None]:
def persist_raw_failure_todo(raw_text: str, *, filename: str = "raw_failure.txt") -> Path:
    out_path = OUTPUT_DIR / filename
    out_path.write_text(raw_text, encoding="utf-8")
    return out_path


for raw in [raw_bad_json, raw_wrong_shape]:
    try:
        parse_and_validate(raw)
    except Exception:
        p = persist_raw_failure_todo(raw)
        print("saved raw failure to", p)