<a href="https://colab.research.google.com/github/patrickfleith/datapipes/blob/main/Structured_Output_with_OpenAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Structured Output with OpenAI
In order to build reliable pipelines in which LLMs consistent return output in the same format, we are using a **Structured Output**
- This means that we define a blueprint for the output
- We pass the 'blueprint' to the LLM
- Then the LLM output will confirm to the blueprint.

This 'blueprint' in the LLM jargon is often called a "schema".

In [None]:
# openai for LLM, pydantic to define the schema
!pip install openai pydantic --quiet

In [None]:
from pydantic import BaseModel
from openai import OpenAI
from google.colab import userdata

In [None]:
OPENAI_API_KEY = userdata.get("OPENAI_API_KEY")

## Getting started with Structured Generation
Let's imagine we want to consistently generate RPG characters with:
- a name
- an age
- a city
- a profession
- a background story
- inventory

We'll define the schema (the blueprint) for the structured output.

In [None]:
client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))

In [None]:
from enum import Enum

class City(str, Enum):
    aria = "Aria"
    kniga = "Kniga"
    aquabah = "Aquabah"
    torini = "Torini"

class Character(BaseModel):
    name: str
    age: int
    city: City
    job: str
    two_sentences_background_story: str
    inventory: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful table-top RPG gamemaster assistant."},
        {
            "role": "user",
            "content": "Generate a character for a low-fantasy RPG campaign. Be creative"},
    ],
    temperature=1.0,
    response_format=Character,
)

In [None]:
message = completion.choices[0].message

In [None]:
print(f"Name: {message.parsed.name}")
print(f"age: {message.parsed.age}")
print(f"City: {message.parsed.city}")

print(f"Job: {message.parsed.job}")
print(f"Inventory: {message.parsed.inventory}\n")

Name: Elara Thorne
age: 28
City: Aria
Job: Herbalist
Inventory: ['Healing herbs', 'Flask of elixir', 'Dagger', 'Leather satchel', 'Map of local flora']



In [None]:
message.parsed.two_sentences_background_story

'Elara grew up in the bustling city of Aria, learning the art of herbalism from her grandmother, a renowned healer. After a tragic incident involving a corrupt nobleman, she now travels the land, seeking justice for those wronged and using her knowledge of plants to aid the less fortunate.'

# Structured Output with Instructor

`instructor` is a popular library for structured outputs powered by llms. Designed for simplicity, transparency, and control.

It also used pydantic so you'll see it is very similar.

In [None]:
!pip install instructor --quiet

In [None]:
import instructor

# Patch the OpenAI client
client = instructor.from_openai(
    OpenAI(
        api_key=userdata.get('OPENAI_API_KEY')
        )
    )

# Generate structured data from natural language
character = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=Character,
    messages=[
        {
            "role": "system",
            "content": "You are a helpful table-top RPG gamemaster assistant."},
        {
            "role": "user",
            "content": "Generate a Warrior character for a low-fantasy RPG campaign. Be creative"},
    ],
    temperature=1.5
)

In [None]:
character.name

'Darek Ironhand'

In [None]:
character.age

32

In [None]:
character.city

<City.aria: 'Aria'>

In [None]:
character.job

'Mercenary Warrior'

In [None]:
character.two_sentences_background_story

'Darek Ironhand hails from a small village on the outskirts of Aria, where he learned the art of combat by defending his home from raiders. After losing his family to a brutal attack, he took up mercenary work to avenge his loved ones and protect the weak.'