# Lesson 4: Structured Outputs

This notebook explores **Structured Outputs** for guiding LLM outputs.

We will use the `google-genai` library to interact with Google's Gemini models.

**Learning Objectives:**

1.  **Understand structured outputs** and why they are crucial for reliable data extraction from LLMs.
2.  **Enforce structured data formats (JSON)** from an LLM using prompt engineering techniques.
3.  **Leverage Pydantic models** to define and manage complex data structures for structured outputs, improving code robustness and clarity.
4.  **Use Gemini's native structured output capabilities** for the most reliable and efficient approach.

## 1. Setup

First, we define some standard Magic Python commands to autoreload Python packages whenever they change:

In [1]:
%load_ext autoreload
%autoreload 2

### Set Up Python Environment

To set up your Python virtual environment using `uv` and load it into the Notebook, follow the step-by-step instructions from the `Course Admin` lesson from the beginning of the course.

**TL;DR:** Be sure the correct kernel pointing to your `uv` virtual environment is selected.

### Configure Gemini API

To configure the Gemini API, follow the step-by-step instructions from the `Course Admin` lesson.

But here is a quick check on what you need to run this Notebook:

1.  Get your key from [Google AI Studio](https://aistudio.google.com/app/apikey).
2.  From the root of your project, run: `cp .env.example .env` 
3.  Within the `.env` file, fill in the `GOOGLE_API_KEY` variable:

Now, the code below will load the key from the `.env` file:

In [2]:
from utils import env

env.load(required_env_vars=["GOOGLE_API_KEY"])

Trying to load environment variables from `/Users/fabio/Desktop/course-ai-agents/.env`
Environment variables loaded successfully.


### Import Key Packages

In [3]:
import json

from google import genai
from google.genai import types
from pydantic import BaseModel, Field

from utils import pretty_print

### Initialize the Gemini Client

In [4]:
client = genai.Client()

### Define Constants

We will use the `gemini-2.5-flash` model, which is fast and cost-effective:

In [5]:
MODEL_ID = "gemini-2.5-flash"

## 2. Implementing structured outputs from scratch using JSON

Sometimes, you don't need the LLM to take an action, but you need its output in a specific, machine-readable format. Forcing the output to be JSON is a common way to achieve this.

We can instruct the model to do this by **prompting** clearly describing the desired JSON structure in the prompt.

### Example: Extracting Metadata from a Document

Let's imagine we have a markdown document and we want to extract key information like a summary, tags, and keywords into a clean JSON object.

In [6]:
DOCUMENT = """
# Q3 2023 Financial Performance Analysis

The Q3 earnings report shows a 20% increase in revenue and a 15% growth in user engagement, 
beating market expectations. These impressive results reflect our successful product strategy 
and strong market positioning.

Our core business segments demonstrated remarkable resilience, with digital services leading 
the growth at 25% year-over-year. The expansion into new markets has proven particularly 
successful, contributing to 30% of the total revenue increase.

Customer acquisition costs decreased by 10% while retention rates improved to 92%, 
marking our best performance to date. These metrics, combined with our healthy cash flow 
position, provide a strong foundation for continued growth into Q4 and beyond.
"""

prompt = f"""
Analyze the following document and extract metadata from it. 
The output must be a single, valid JSON object with the following structure:
<json>
{{ 
    "summary": "A concise summary of the article.", 
    "tags": ["list", "of", "relevant", "tags"], 
    "keywords": ["list", "of", "key", "concepts"],
    "quarter": "Q...",
    "growth_rate": "...%",
}}
</json>

Here is the document:
<document>
{DOCUMENT}
</document>
"""

response = client.models.generate_content(model=MODEL_ID, contents=prompt)

pretty_print.wrapped(text=response.text, title="Raw LLM Output", indent=2)

[93m------------------------------------------ Raw LLM Output ------------------------------------------[0m
  ```json
{
    "summary": "The Q3 2023 financial report highlights a strong performance with a 20% increase in revenue and 15% growth in user engagement, surpassing market expectations. This success is attributed to a robust product strategy, effective market positioning, and successful expansion into new markets, leading to improved customer retention and reduced acquisition costs.",
    "tags": [
        "financials",
        "earnings report",
        "business performance",
        "revenue growth",
        "market expansion",
        "Q3 2023"
    ],
    "keywords": [
        "Q3 2023",
        "revenue",
        "user engagement",
        "market expectations",
        "product strategy",
        "market positioning",
        "digital services",
        "new markets",
        "customer acquisition costs",
        "retention rates",
        "cash flow"
    ],
    "quarter

In [7]:
def extract_json_from_response(response: str) -> dict:
    """
    Extracts JSON from a response string that is wrapped in <json> or ```json tags.
    """

    response = response.replace("<json>", "").replace("</json>", "")
    response = response.replace("```json", "").replace("```", "")

    return json.loads(response)

You can now reliably parse the JSON string:

In [8]:
parsed_response = extract_json_from_response(response.text)
pretty_print.wrapped(
    text=[f"Type of the parsed response: `{type(parsed_response)}`", json.dumps(parsed_response, indent=2)],
    title="Parsed JSON Object",
    indent=2,
)

[93m---------------------------------------- Parsed JSON Object ----------------------------------------[0m
  Type of the parsed response: `<class 'dict'>`
[93m----------------------------------------------------------------------------------------------------[0m
  {
  "summary": "The Q3 2023 financial report highlights a strong performance with a 20% increase in revenue and 15% growth in user engagement, surpassing market expectations. This success is attributed to a robust product strategy, effective market positioning, and successful expansion into new markets, leading to improved customer retention and reduced acquisition costs.",
  "tags": [
    "financials",
    "earnings report",
    "business performance",
    "revenue growth",
    "market expansion",
    "Q3 2023"
  ],
  "keywords": [
    "Q3 2023",
    "revenue",
    "user engagement",
    "market expectations",
    "product strategy",
    "market positioning",
    "digital services",
    "new markets",
    "customer acqu

## 3. Implementing structured outputs from scratch using Pydantic

While prompting for JSON is effective, it can be fragile. A more robust and modern approach is to use **Pydantic**. Pydantic allows you to define data structures as Python classes. This gives you:

- **A single source of truth**: The Pydantic model defines the structure.
- **Automatic schema generation**: You can easily generate a JSON Schema from the model.
- **Data validation**: You can validate the LLM's output against the model to ensure it conforms to the expected structure and types.

Let's recreate the previous example using Pydantic.

In [9]:
class DocumentMetadata(BaseModel):
    """A class to hold structured metadata for a document."""

    summary: str = Field(description="A concise, 1-2 sentence summary of the document.")
    tags: list[str] = Field(description="A list of 3-5 high-level tags relevant to the document.")
    keywords: list[str] = Field(description="A list of specific keywords or concepts mentioned.")
    quarter: str = Field(description="The quarter of the financial year described in the document (e.g, Q3 2023).")
    growth_rate: str = Field(description="The growth rate of the company described in the document (e.g, 10%).")

### Injecting Pydantic Schema into the Prompt

We can generate a JSON Schema from our Pydantic model and inject it directly into the prompt. This is a more formal way of telling the LLM what structure to follow.

Note how, along with the field type, we can leverage the Field description automatically to clearly specify to the LLM what each field means.

In [10]:
schema = DocumentMetadata.model_json_schema()
schema

{'description': 'A class to hold structured metadata for a document.',
 'properties': {'summary': {'description': 'A concise, 1-2 sentence summary of the document.',
   'title': 'Summary',
   'type': 'string'},
  'tags': {'description': 'A list of 3-5 high-level tags relevant to the document.',
   'items': {'type': 'string'},
   'title': 'Tags',
   'type': 'array'},
  'keywords': {'description': 'A list of specific keywords or concepts mentioned.',
   'items': {'type': 'string'},
   'title': 'Keywords',
   'type': 'array'},
  'quarter': {'description': 'The quarter of the financial year described in the document (e.g, Q3 2023).',
   'title': 'Quarter',
   'type': 'string'},
  'growth_rate': {'description': 'The growth rate of the company described in the document (e.g, 10%).',
   'title': 'Growth Rate',
   'type': 'string'}},
 'required': ['summary', 'tags', 'keywords', 'quarter', 'growth_rate'],
 'title': 'DocumentMetadata',
 'type': 'object'}

In [11]:
prompt = f"""
Please analyze the following document and extract metadata from it. 
The output must be a single, valid JSON object that conforms to the following JSON Schema:
<json>
{json.dumps(schema, indent=2)}
</json>

Here is the document:
<document>
{DOCUMENT}
</document>
"""

response = client.models.generate_content(model=MODEL_ID, contents=prompt)

parsed_response = extract_json_from_response(response.text)

pretty_print.wrapped(
    text=[f"Type of the parsed response: `{type(parsed_response)}`", json.dumps(parsed_response, indent=2)],
    title="Parsed JSON Object",
    indent=2,
)

[93m---------------------------------------- Parsed JSON Object ----------------------------------------[0m
  Type of the parsed response: `<class 'dict'>`
[93m----------------------------------------------------------------------------------------------------[0m
  {
  "summary": "The Q3 2023 earnings report indicates strong financial performance with a 20% increase in revenue and 15% growth in user engagement, driven by successful product strategy, market expansion, and improved customer retention.",
  "tags": [
    "Financial Performance",
    "Earnings Report",
    "Business Growth",
    "Market Expansion",
    "Customer Metrics"
  ],
  "keywords": [
    "Q3 2023",
    "revenue increase",
    "user engagement",
    "digital services",
    "new markets",
    "customer acquisition costs",
    "retention rates",
    "cash flow"
  ],
  "quarter": "Q3 2023",
  "growth_rate": "20%"
}
[93m-------------------------------------------------------------------------------------------------

As you can see, conceptually, the results are the same. But now, we can easily validate the output with Pydantic:

In [12]:
try:
    document_metadata = DocumentMetadata.model_validate(parsed_response)
    print("\nValidation successful!")

    pretty_print.wrapped(
        ["Type of the validated response: `{type(document_metadata)}`", document_metadata.model_dump_json(indent=2)],
        title="Pydantic Validated Object",
        indent=2,
    )
except Exception as e:
    print(f"\nValidation failed: {e}")


Validation successful!
[93m------------------------------------ Pydantic Validated Object ------------------------------------[0m
  Type of the validated response: `{type(document_metadata)}`
[93m----------------------------------------------------------------------------------------------------[0m
  {
  "summary": "The Q3 2023 earnings report indicates strong financial performance with a 20% increase in revenue and 15% growth in user engagement, driven by successful product strategy, market expansion, and improved customer retention.",
  "tags": [
    "Financial Performance",
    "Earnings Report",
    "Business Growth",
    "Market Expansion",
    "Customer Metrics"
  ],
  "keywords": [
    "Q3 2023",
    "revenue increase",
    "user engagement",
    "digital services",
    "new markets",
    "customer acquisition costs",
    "retention rates",
    "cash flow"
  ],
  "quarter": "Q3 2023",
  "growth_rate": "20%"
}
[93m------------------------------------------------------------

## 4. Implementing structured outputs using Gemini and Pydantic

Using Gemini's `GenerateContentConfig` we can enforce the output as a Pydantic object without any special prompt engineering.

We can instruct the model to do this by setting `response_mime_type` to `"application/json"` in the generation configuration, which forces the model's output to be a valid JSON object and the `response_schema` to our Pydantic object.

**Note:** If you use only the `response_mime_type="application/json"` setting you can output raw JSON formats.

In [13]:
config = types.GenerateContentConfig(response_mime_type="application/json", response_schema=DocumentMetadata)

prompt = f"""
Analyze the following document and extract its metadata.

Here is the document:
<document>
{DOCUMENT}
</document>
"""

response = client.models.generate_content(model=MODEL_ID, contents=prompt, config=config)
pretty_print.wrapped(
    [f"Type of the response: `{type(response.parsed)}`", response.parsed.model_dump_json(indent=2)],
    title="Pydantic Validated Object",
    indent=2,
)

[93m------------------------------------ Pydantic Validated Object ------------------------------------[0m
  Type of the response: `<class '__main__.DocumentMetadata'>`
[93m----------------------------------------------------------------------------------------------------[0m
  {
  "summary": "The Q3 2023 earnings report shows a 20% increase in revenue and 15% growth in user engagement, exceeding market expectations due to successful product strategy and market expansion. Customer acquisition costs decreased by 10% and retention improved to 92%, indicating a strong foundation for continued growth.",
  "tags": [
    "Financial Performance",
    "Earnings Report",
    "Revenue Growth",
    "Market Expansion",
    "User Engagement"
  ],
  "keywords": [
    "Q3 2023",
    "revenue increase",
    "user engagement",
    "product strategy",
    "market positioning",
    "digital services",
    "new markets",
    "customer acquisition costs",
    "retention rates",
    "cash flow"
  ],
  "

From now on, throughout this course, we will utilize this native Gemini approach to generate structured outputs, aiming to achieve the most reliable and efficient results. Additionally, when using LangChain or LangGraph, we will utilize their abstractions on top of the same logic.