# Structured output
## Introduction
- In applications we often want the llm to return structured output according to a schema.
- We can ask it to return a json, but this might not be what we want.
- In this lesson we'll show how to use Pydantic to define the schema and have the LLM return a structured output.

## Installation

In [22]:
%pip install -q langchain langchain-openai pydantic

Note: you may need to restart the kernel to use updated packages.


## Simple JSON

Let's start with a simple question and ask it to return it as json.

In [23]:
from langchain_openai import ChatOpenAI

# Initialize the language model
llm_model = "gpt-4o-mini"
llm = ChatOpenAI(temperature=0, model=llm_model)

We ask it to return a json with the fields firstname, lastname, title.

In [24]:
question = "My name is Patrick Debois. I am a genAI expert. Please return this information as a json using the fields firstname, lastname, title"
result = llm.invoke(question)
print(result.content)

```json
{
  "firstname": "Patrick",
  "lastname": "Debois",
  "title": "genAI expert"
}
```


You can see that it actually doesn't return a json, but a Markdown formatted json block.

Now we ask it a bit more complicated by turning it into an array of people.

In [25]:
question = "My name is Patrick Debois. I am a genAI expert. I got two eyes. Please return this information as a json with an Array of people using the fields firstname, lastname, title and number of eyes."
result = llm.invoke(question)
print(result.content)

Here is the information formatted as a JSON object with an array of people:

```json
{
  "people": [
    {
      "firstname": "Patrick",
      "lastname": "Debois",
      "title": "genAI expert",
      "number_of_eyes": 2
    }
  ]
}
```


## Pydantic Library

To improve the experience  we will use the `pydantic` library to define an object with schema checking.

In [26]:
from typing import List

from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field, validator

## Defining the Object properties using a schema

We define a basic Person object with firstname ,  lastname as well as age.

You can see that we can specify the type of age indicating we want it as an integer and not a string.

In [27]:
# Define your desired data structure.
class Person(BaseModel):
    firstname: str = Field(description="person first name")
    lastname: str = Field(description="person last name")
    age: int = Field(description="person age")
    eyes: int = Field(description="eyes count")

## Turning a Pydantic in a Parser
To use the schema, we take the object definition and turn this into a Langchain Pydantic parser.
The parser can then be used to return the instructions it would pass to the LLM

In [28]:
# Set up a parser + inject instructions into the prompt template.
person_parser = PydanticOutputParser(pydantic_object=Person)
print(person_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"firstname": {"description": "person first name", "title": "Firstname", "type": "string"}, "lastname": {"description": "person last name", "title": "Lastname", "type": "string"}, "age": {"description": "person age", "title": "Age", "type": "integer"}, "eyes": {"description": "eyes count", "title": "Eyes", "type": "integer"}}, "required": ["firstname", "lastname", "age", "eyes"]}
```


All the instructions can now be passed to the LLM using a system prompt for example

## Adding the instructions to the prompt
Now that we have instructions, adding them to the prompt template is easy.

In [29]:
from langchain_openai import ChatOpenAI

# And a query intented to prompt a language model to populate the data structure.
name_query = "My name is Patrick Debois. I'm 55 year's old. I got two eyes. What is my first name, last name , age and number of eyes ?"

prompt_template = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": person_parser.get_format_instructions()},
)

prompt = prompt_template.format(query=name_query)
answer = llm.invoke(prompt)
print(answer.content)

```json
{
  "firstname": "Patrick",
  "lastname": "Debois",
  "age": 55,
  "eyes": 2
}
```


Example adapted from <https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/types/pydantic/>

## A List of things

Now we can do the same with our list of People example:

In [30]:
class People(BaseModel):
    friends: List[Person] = Field(description="a list of persons")

# Set up a parser + inject instructions into the prompt template.
people_parser = PydanticOutputParser(pydantic_object=People)
print(people_parser.get_format_instructions())

from langchain_openai import ChatOpenAI

# And a query intented to prompt a language model to populate the data structure.
name_query = "My name is Patrick Debois. I'm 55 year's old. What is do you know about me ?"

prompt_template = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": people_parser.get_format_instructions()},
)

prompt = prompt_template.format(query=name_query)
answer = llm.invoke(prompt)

print("======== the json ==========")
print(answer.content)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"$defs": {"Person": {"properties": {"firstname": {"description": "person first name", "title": "Firstname", "type": "string"}, "lastname": {"description": "person last name", "title": "Lastname", "type": "string"}, "age": {"description": "person age", "title": "Age", "type": "integer"}, "eyes": {"description": "eyes count", "title": "Eyes", "type": "integer"}}, "required": ["firstname", "lastname", "age", "eyes"], "title": "Person", "type": "object"}}, "properties": {"friends": {"description": "a list of persons", "items": {"$ref": "#/$defs/P

- Now we can use the parser to parse the answer and get an Object back.
- No need to extract the json from the markdown ourselves, the parser does this for us.

In [31]:
people = people_parser.parse(answer.content)
print(people)

friends=[]
