## Summary

Repo: https://github.com/pgahq/instructor-groq-openai-llm-examples

This notebook shows how to use Instructor to extract structured info (including lists) from unstructured text.

Note: this notebook assumes you're using Google Colab. You can safely edit / play here. Or go to `File` -> `Save a copy in Google Drive` to make your own version.



In [1]:
!pip install --quiet instructor groq openai jsonref

On the left, click the key and set two secrets with your keys. Be sure to enable "Notebook access" for them. This is how Google Colab works...you're not sharing your keys with anyone.

OPENAI_API_KEY - get a key from https://platform.openai.com/api-keys
GROQ_API_KEY - get a key from https://console.groq.com/keys

In [2]:
import openai
import groq
import instructor
from typing import List, Optional
from pydantic import BaseModel, Field
import os

try:
    from google.colab import userdata
    os.environ['OPENAI_API_KEY'] = '' or userdata.get('OPENAI_API_KEY') # or put your key in the '' on this line
    os.environ['GROQ_API_KEY'] = '' or userdata.get('GROQ_API_KEY')
except Exception as e:
    # print(e)
    pass

if not os.environ.get('OPENAI_API_KEY') or not os.environ.get('GROQ_API_KEY'):
    raise ValueError("Both OPENAI_API_KEY and GROQ_API_KEY environment variables must be set and non-empty. Read the text in the notebook (above this block) for more info.")


We'll feed the LLM a story and an Instructor model. First, the story...

In [5]:
story_text = f"""
**The Mysterious Island**

Dr. Maria Rodriguez, a renowned botanist, stood at the bow of the small sailboat, her eyes fixed on the uncharted island rising from the sea. She was joined by her team: Jax, a rugged sailor with a penchant for adventure; Dr. Sophia Patel, a brilliant chemist; Ethan, a tech-savvy engineer; and Maya, a young and ambitious journalist.

Their mission was to explore the island, rumored to be home to a rare, cancer-fighting plant. As they disembarked, the warm sun on their skin and the sweet scent of blooming flowers enveloped them.

"Alright, team," Maria said, "let's get to work. Sophia, can you start collecting plant samples? Ethan, set up the lab equipment. Maya, see if you can find any signs of previous visitors. Jax, take point on security. And I'll start surveying the island's terrain."

As they dispersed, Maya stumbled upon a cryptic message scrawled on a palm tree: "Turn back while you still can." She showed it to the others, but they were undeterred.

That night, as they sat around a crackling campfire, Sophia announced that she had found a strange, glowing plant with remarkable healing properties. But their celebration was short-lived, as a loud rumble shook the island, and a massive stone door hidden in the jungle floor swung open, revealing an ancient temple.

"What have we stumbled into?" Ethan asked, his eyes wide with wonder.

As they cautiously entered the temple, they discovered ancient artifacts and mysterious symbols etched into the walls. Suddenly, the air was filled with an otherworldly energy, and the team found themselves face to face with an ancient, powerful entity.

"Who are you?" Maria asked, her voice steady.

"We have been waiting for you," the entity replied. "You have freed us from our slumber. We will grant you one wish, but be warned: be careful what you wish for."

The team exchanged nervous glances, weighing the risks and possibilities. Maya spoke up, "We wish for the knowledge and power to heal the world's diseases, but only if used for the greater good."

The entity nodded, and with a burst of light, the team was transformed, their minds flooded with ancient secrets and their bodies infused with the island's mystical energy. As they sailed away from the island, they knew that their lives – and the fate of humanity – would never be the same.
"""

## Extract a simple list
Extract a List of `Quote` objects i.e. things that were explicitly said in the story.

In [6]:
class Quote(BaseModel):
    speaker: str = Field(description="Who said the quote.")
    quote: str = Field(description="Verbatim quote from the story.")

inference_provider = "openai"   # "openai" or "groq"

client = instructor.from_openai(openai.OpenAI()) if inference_provider == "openai" else instructor.from_groq(groq.Groq())
quotes = client.chat.completions.create(
    model="llama3-70b-8192" if inference_provider == "groq" else "gpt-4-turbo",
    messages=[
        {"role": "user", "content": story_text},
    ],
    response_model=List[Quote],   # this is the Instructor magic
    temperature=0.5,
)

for quote in quotes:
    from rich import print as rprint
    rprint(quote)

## Extract using a more complicated model
Here, `Story` contains a List of `Person` objects in addition to other fields.

Note that we've asked the model to be creative with several things in the `Person` model. This can lead to hallucinations - which is either good or bad, depending on what you want the model to do. You can also see how `Optional` affects this.

In [7]:
class Person(BaseModel):
    name: str = Field(description="Character from the story.")
    backstory: str = Field(description="Backstory of the character.")
    perspective: str = Field(description="Tell the story through the character's eyes.")
    hair_color: str = Field(description="The character's hair color only if mentioned explicitly in the story.")
    favorite_shoes: Optional[str] = Field(description="The character's favorite shoe brand only if mentioned explicitly in the story.")

class Story(BaseModel):
    story: str = Field(description="2-sentence summary")
    title: str = Field(description="The title")
    characters: List[Person] = Field(description="Characters")

inference_provider = "openai"   # "openai" or "groq"

client = instructor.from_openai(openai.OpenAI()) if inference_provider == "openai" else instructor.from_groq(groq.Groq())
story = client.chat.completions.create(
    model="llama3-70b-8192" if inference_provider == "groq" else "gpt-4o",
    messages=[
        {"role": "user", "content": story_text},
    ],
    response_model=Story,
    temperature=0.5,
)

print(story.model_dump_json(indent=4))

{
    "story": "Dr. Maria Rodriguez, a renowned botanist, stood at the bow of the small sailboat, her eyes fixed on the uncharted island rising from the sea. She was joined by her team: Jax, a rugged sailor with a penchant for adventure; Dr. Sophia Patel, a brilliant chemist; Ethan, a tech-savvy engineer; and Maya, a young and ambitious journalist. Their mission was to explore the island, rumored to be home to a rare, cancer-fighting plant. As they disembarked, the warm sun on their skin and the sweet scent of blooming flowers enveloped them. \"Alright, team,\" Maria said, \"let's get to work. Sophia, can you start collecting plant samples? Ethan, set up the lab equipment. Maya, see if you can find any signs of previous visitors. Jax, take point on security. And I'll start surveying the island's terrain.\" As they dispersed, Maya stumbled upon a cryptic message scrawled on a palm tree: \"Turn back while you still can.\" She showed it to the others, but they were undeterred. That night,