# Getting Started with Structured Output

We will be using the [OpenAI APIs](https://platform.openai.com/) for all examples.

---

## 2. Structured Output with LLMs

Objectives
- Load the necessary libraries
- Understand structured output formats
- Create prompts for structured data extraction
- Explore common use cases for structured outputs

Below we are loading the necessary libraries, utilities, and configurations.

In [0]:
# update or install the necessary libraries
!pip install openai
!pip install --upgrade typing_extensions

Collecting openai
  Downloading openai-2.1.0-py3-none-any.whl (964 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 964.9/964.9 kB 14.4 MB/s eta 0:00:00
Collecting typing-extensions<5,>=4.11
  Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.6/44.6 kB 8.9 MB/s eta 0:00:00
Collecting httpx<1,>=0.23.0
  Downloading httpx-0.28.1-py3-none-any.whl (73 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73.5/73.5 kB 25.9 MB/s eta 0:00:00
Collecting tqdm>4
  Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 25.2 MB/s eta 0:00:00
Collecting jiter<1,>=0.4.0
  Downloading jiter-0.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (348 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 348.7/348.7 kB 84.0 MB/s eta 0:00:00
Collecting httpcore==1.*
  Downloading httpcore-1.0.9-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.8/78.8 kB 34.0 MB/s eta 

In [0]:
dbutils.library.restartPython()

In [0]:
from openai import OpenAI
import os
from pydantic import BaseModel
import IPython

DATABRICKS_TOKEN = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://adb-3750392177977863.3.azuredatabricks.net/serving-endpoints"
)

### 2.1 Generic extraction

In [0]:
chat_completion = client.chat.completions.create(
  messages=[
      {"role": "system", "content": "Extract the event information."},
      {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
  ],
  model="databricks-gpt-oss-120b",
  max_tokens=256
)

In [0]:
IPython.display.Markdown(chat_completion.choices[0].message.content[1]['text'])

**Extracted Event Information**

```json
{
  "event": "science fair",
  "participants": [
    "Alice",
    "Bob"
  ],
  "date": "Friday"
}
```

### 2.2 Explicit request json only

In [0]:
chat_completion = client.chat.completions.create(
  messages=[
      {"role": "system", "content": "Extract the event information in json format. Make sure to only return a json object in your response"},
      {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
  ],
  model="databricks-gpt-oss-120b",
  max_tokens=256
)

In [0]:
IPython.display.Markdown(chat_completion.choices[0].message.content[1]['text'])

{
  "participants": ["Alice", "Bob"],
  "event": "science fair",
  "date": "Friday"
}

### 2.3 Explicit request json only + format + examples

In [0]:
schema = """
Return a JSON object with the following fields:
- "event": string, the name of the event
- "participants": list of strings, the people involved
- "date": string, the date of the event (if available)
- "location": string, the location of the event (if available)
Only return a JSON object matching this schema.
"""

few_shot_examples = [
    # Example 1
    {"role": "user", "content": "John and Mary will attend a wedding in Paris on Saturday."},
    {"role": "assistant", "content": '''
{
  "event": "wedding",
  "participants": ["John", "Mary"],
  "date": "Saturday",
  "location": "Paris"
}
'''},

    # Example 2
    {"role": "user", "content": "Bob is going to a conference next week."},
    {"role": "assistant", "content": '''
{
  "event": "conference",
  "participants": ["Bob"],
  "date": "next week",
  "location": ""
}
'''},
]

messages = [
    {"role": "system", "content": f"Extract event information. {schema}"},
    *few_shot_examples,
    {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
]

chat_completion = client.chat.completions.create(
    messages=messages,
    model="databricks-gpt-oss-120b",
    max_tokens=256
)

In [0]:
IPython.display.Markdown(chat_completion.choices[0].message.content[1]['text'])

{
  "event": "science fair",
  "participants": ["Alice", "Bob"],
  "date": "Friday",
  "location": ""
}

### 2.4 Structured Output

More info on proper structured output available here: https://platform.openai.com/docs/guides/structured-outputs?example=structured-data

### 2.5 Examples

In [0]:
# 1. Sentiment Analysis Extraction
messages = [
    {"role": "system", "content": "Extract sentiment as a JSON object with fields: sentiment (positive, negative, neutral), and reason."},
    {"role": "user", "content": "The product exceeded my expectations and I would buy it again."}
]
response_format = {"type": "json_object"}
response = client.chat.completions.create(
    model="databricks-gpt-oss-120b",
    messages=messages
)

IPython.display.Markdown(response.choices[0].message.content[1]['text'])

{
  "sentiment": "positive",
  "reason": "The statement expresses strong satisfaction ('exceeded my expectations') and a willingness to repurchase, indicating a clearly positive sentiment."
}

In [0]:
# 2. Product Attribute Extraction
messages = [
    {"role": "system", "content": "Extract product attributes as JSON: name, price, color, size, and ensure the response is in JSON format."},
    {"role": "user", "content": "The SmartHome Mini is a compact smart home assistant available in black or white for only $49.99. It's 5 inches wide."}
]
response_format = {"type": "json_object"}
response = client.chat.completions.create(
    model="databricks-gpt-oss-120b",
    messages=messages
)

IPython.display.Markdown(response.choices[0].message.content[1]['text'])

```json
{
  "name": "SmartHome Mini",
  "price": 49.99,
  "color": ["black", "white"],
  "size": "5 inches wide"
}
```

In [0]:
# 3. Location Extraction
messages = [
    {"role": "system", "content": "Extract location details as a JSON object with fields: city, country."},
    {"role": "user", "content": "I recently traveled from Paris, France to Berlin, Germany for a conference."}
]
response_format = {"type": "json_object"}
response = client.chat.completions.create(
    model="databricks-gpt-oss-120b",
    messages=messages
)

IPython.display.Markdown(response.choices[0].message.content[1]['text'])

```json
[
  {
    "city": "Paris",
    "country": "France"
  },
  {
    "city": "Berlin",
    "country": "Germany"
  }
]
```

In [0]:
# 4. Contact Information Extraction
messages = [
    {"role": "system", "content": "Extract contact information as a JSON object with fields: name, email, phone."},
    {"role": "user", "content": "You can reach Jane Doe at jane.doe@email.com or call her at 555-1234."}
]
response_format = {"type": "json_object"}
response = client.chat.completions.create(
    model="databricks-gpt-oss-120b",
    messages=messages
)

IPython.display.Markdown(response.choices[0].message.content[1]['text'])

```json
{
  "name": "Jane Doe",
  "email": "jane.doe@email.com",
  "phone": "555-1234"
}
```