# Introduction
Large language models (LLMs) excel at chatting in free-form text. This works well for interactive Q&A.
However, in software applications we must parse outputs programmatically. Unstructured text forces developers to write brittle post-processing code. 


## Structured Outputs 

Structured outputs help us get structured, predictable data from langague models. Its like giving the LLM a form to fill out instead of letting it respond however it wants.



Step 1: Define your schema

First you must define an object or data structure to represent the JSON Schema that the model should be constarained to follow. 

```
from pydantic import BaseModel

class Step(BaseModel):
    explanation: str
    output: str

class MathResponse(BaseModel):
    steps: list[Step]
    final_answer: str

```

**Step 2: Supply the object in the API call:**

We can use the `parse` method to automatically parse the JSON response into the object we defined. Under the hood, the SDK take care of supplying the JSON schema corresponding to our data structure, and then parsing the response as an object.



** Step 3: Handle edge cases**:


#### Chain of thought
We can ask a model to output an answer in a structured step by step way to guide the user through a solution. 

In [1]:
from pydantic import BaseModel
from openai import OpenAI
import pprint as pp
 
client = OpenAI()

class Step(BaseModel):
    explanation: str
    output: str

class MathReasoning(BaseModel):
    steps: list[Step]
    final_answer: str

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format=MathReasoning,
)

math_reasoning = completion.choices[0].message.parsed

In [6]:
math_reasoning

MathReasoning(steps=[Step(explanation="To solve for x, we first need to isolate the term with x on one side of the equation. To do this, we'll get rid of the constant on the left side by subtracting 7 from both sides of the equation.", output='8x + 7 - 7 = -23 - 7'), Step(explanation='After simplifying, the left side has 7 - 7, which cancels out, leaving us with only the 8x term. The right side, -23 - 7, simplifies to -30.', output='8x = -30'), Step(explanation="Now we need to solve for x by getting x alone. Since x is currently multiplied by 8, we'll divide both sides of the equation by 8 to isolate x.", output='8x / 8 = -30 / 8'), Step(explanation="Similarly, when we simplify -30 divided by 8, we initially get -3.75, but it can also be represented as a fraction -15/4 if needed. However, let's use the decimal since it is simpler in this case.", output='x = -3.75')], final_answer='x = -3.75')

### Generating `enum` value 
In some cases we might want the model to choose a single option from a list of options. To implement this behavior, we can pass an `enum` into our schema. LIke JSON schema, an enum lets us constrain the model output to meet the requirements of our application.



# Case Study

Why “residential vs commercial” is worth automating

A postal clerk, a utility company, and a Statistics Canada survey designer all need the same first decision: is this line of text a home or a place of business?

- Canada Post applies different delivery rules and fees once an address is flagged commercial.
- StatsCan’s Address Register keeps separate residential and collective‑dwelling frames to hit the right households during the census.
- Small firms tune marketing spend by excluding PO boxes and corporate towers from “neighbourhood flyer” campaigns.

Doing the split by hand is slow and error‑prone; doing it with a plain LLM prompt is brittle because the model might drift off schema.  Structured output fixes that drift by validating the JSON response against your own Python types before your code ever touches it.



In [8]:
from enum import Enum
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI()

class AddressKind(str, Enum):
    residential = "residential"
    commercial  = "commercial"
    review      = "review"      # model must choose this if uncertain

class ClassifiedAddress(BaseModel):
    kind: AddressKind
    original: str

Enum forces the model to pick exactly one of three admissible strings; Pydantic refuses the response otherwise.  That guarantee is straight out of Pydantic’s enum‑validation rules.

The review sentinel embodies OpenAI’s own advice: when the task is a judgment call, bake in a “hand‑off to human” option instead of hoping the model never hesitates. 

We pass full cues, not naked street lines, so a human—and therefore the model—can judge without guessing:

In [9]:
ADDRESSES = [
    "TD Bank, 700 Sussex Dr, Ottawa ON K1N 1K4",
    "The French Baker, 119 Murray St, Ottawa ON K1N 5M5",
    "126 Glenview Ave, Ottawa ON K1S 1R2 — 3‑bed detached home",
    "225 Main St, Ottawa ON",
]

In [20]:
SYSTEM_PROMPT = (
    "Return a ClassifiedAddress object. "
    "If you cannot tell from the text whether it is residential or commercial, "
    "set kind='review'."
)

for line in ADDRESSES:
    out = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[{"role": "system", "content": SYSTEM_PROMPT},
                  {"role": "user",   "content": line}],
        response_format=ClassifiedAddress,
    ).choices[0].message.parsed
    print(out)

kind=<AddressKind.commercial: 'commercial'> certainty=<Certainty.high: 'high'> original='TD Bank, 700 Sussex Dr, Ottawa ON K1N 1K4'
kind=<AddressKind.commercial: 'commercial'> certainty=<Certainty.high: 'high'> original='The French Baker, 119 Murray St, Ottawa ON K1N 5M5'
kind=<AddressKind.residential: 'residential'> certainty=<Certainty.high: 'high'> original='126 Glenview Ave, Ottawa ON K1S 1R2 — 3‑bed detached home'
kind=<AddressKind.review: 'review'> certainty=<Certainty.low: 'low'> original='225 Main St, Ottawa ON'


### Example 1 Binary label only

In [15]:
from enum import Enum
from pydantic import BaseModel
from openai import OpenAI
from pprint import pprint

client = OpenAI()              # expects OPENAI_API_KEY in env

class AddressKind(str, Enum):
    residential = "residential"
    commercial  = "commercial"

class ClassifiedAddress(BaseModel):
    kind: AddressKind
    original: str

ADDRESSES = [
    "TD Bank, 700 Sussex Dr, Ottawa ON K1N 1K4",
    "The French Baker, 119 Murray St, Ottawa ON K1N 5M5",
    "126 Glenview Ave, Ottawa ON K1S 1R2 — 3‑bed detached home",
    "225 Main St, Ottawa ON",
]

SYSTEM = "Label the address as residential or commercial."

results = [
    client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[{"role": "system", "content": SYSTEM},
                  {"role": "user", "content": line}],
        response_format=ClassifiedAddress,
    ).choices[0].message.parsed.model_dump()
    for line in ADDRESSES
]

print("Example 1 –results:"); pprint(results, compact=True)

Example 1 –results:
[{'kind': <AddressKind.commercial: 'commercial'>,
  'original': 'TD Bank, 700 Sussex Dr, Ottawa ON K1N 1K4'},
 {'kind': <AddressKind.commercial: 'commercial'>,
  'original': 'The French Baker, 119 Murray St, Ottawa ON K1N 5M5'},
 {'kind': <AddressKind.residential: 'residential'>,
  'original': '126 Glenview Ave, Ottawa ON K1S 1R2 — 3‑bed detached home'},
 {'kind': <AddressKind.commercial: 'commercial'>,
  'original': '225 Main St, Ottawa ON'}]


In [19]:

from pydantic import Field, model_validator
from enum import Enum

class Certainty(str, Enum):
    high = "high"
    low  = "low"

class AddressKind(str, Enum):
    residential = "residential"
    commercial  = "commercial"
    review      = "review"          # forced fallback

class ClassifiedAddress(BaseModel):
    kind: AddressKind
    certainty: Certainty
    original: str

    @model_validator(mode="before")
    def route_low_confidence(cls, data):
        if data.get("certainty") == "low":
            data["kind"] = "review"
        return data

SYSTEM = (
    "Return a ClassifiedAddress object. "
    "If you are not confident, set certainty='low'."
)

results2 = [
    client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[{"role": "system", "content": SYSTEM},
                  {"role": "user", "content": line}],
        response_format=ClassifiedAddress,
    ).choices[0].message.parsed.model_dump()
    for line in ADDRESSES
]

print("Example 2 – with certainty:\n"); pprint(results2, compact=True)


Example 2 – with certainty:

[{'certainty': <Certainty.high: 'high'>,
  'kind': <AddressKind.commercial: 'commercial'>,
  'original': 'TD\u202fBank,\u202f700\u202fSussex\u202fDr, Ottawa ON K1N\u202f'
              '1K4'},
 {'certainty': <Certainty.high: 'high'>,
  'kind': <AddressKind.commercial: 'commercial'>,
  'original': 'The French\u202fBaker,\u202f119\u202fMurray\u202fSt, '
              'Ottawa\u202fON\u202fK1N\u202f5M5'},
 {'certainty': <Certainty.high: 'high'>,
  'kind': <AddressKind.residential: 'residential'>,
  'original': '126\u202fGlenview\u202fAve, Ottawa\u202fON\u202fK1S\u202f1R2 — '
              '3‑bed detached home'},
 {'certainty': <Certainty.low: 'low'>,
  'kind': <AddressKind.review: 'review'>,
  'original': '225\u202fMain\u202fSt, Ottawa\u202fON'}]
