## Getting Started with Instructor (Python)

According to the author Jason Liu, Instructor is built on top of Pydantic and OpenAI SDK. The OpenAI SDK has become the de-facto tool for interacting with Language Models. All other LLM API providers like Google, Anthropic, Cohere, Replicate and others provide alternate configuration (i.e. `base_url` and `api_key`) that allows the use of the OpenAI library to communicate with their proprietary Language Models.  

Pydantic is a popular Python library for data validation using Python type annotations, making it easy to validate complex data structures and automatically generate JSON schemas. Pydantic has become a critical component for the major Generative AI Python tooling like OpenAI SDK, Langchain, LlamaIndex among others.  It has become a go-to tool for validating input and output data among GenAI developers due to its excellent developer experience, strong type safety, and seamless integration with modern Python applications and frameworks.


### Install Instructor & Dependencies

In [2]:
%%capture
%pip install -U instructor
%pip install python-dotenv

In [6]:
%pip freeze | grep instructor

instructor==1.7.0
Note: you may need to restart the kernel to use updated packages.


In [5]:
# Load the .env file from the root directory
# %env $(cat ../../.env | xargs)
import os
from dotenv import load_dotenv, find_dotenv
# Load environment variables from .env
load_dotenv(find_dotenv())

True

## OpenAI LLMs

In [9]:
import instructor
from pydantic import BaseModel
from openai import OpenAI

# Define your desired output structure
class ExtractUser(BaseModel):
    first_name: str
    last_name: str
    birth_year: int
    birth_place: str

# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

# Extract structured data from natural language
res = client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=ExtractUser,
    messages=[{"role": "user", "content": "Ahmad Adewale was born in Kano in 1990"}],
)
print(res.model_dump_json())


{"first_name":"Ahmad","last_name":"Adewale","birth_year":1990,"birth_place":"Kano"}


In [10]:
assert res.first_name == "Ahmad"
assert res.last_name == "Adewale"
assert res.birth_place == "Kano"
assert res.birth_year == 1990