# How to return structured data from a model

It is often useful to have a model return output that matches a specific schema. One common use-case is extracting data from text to inert into a database or use with some other downstream system. This guide covers a few strategies for getting structured outputs from a model.

## The `.with_structured_output()` method

This is the easies and most reliable way to get structured outputs. `.with_structured_output()` is implemented for models that provide native APIs for structuring outputs, like tool/function calling or JSON mode, and use of these capabilities under the hood.

This method takes a schema as input which specifies the names, types, and descriptions of the desired output attributes. The method returns a model-like Runnable, except that instead of outputting strings or Messages it outputs objects corresponding to the given schema. The schema can be specified as a TypedDict class, JSON Schema or Pydantic class. If TypeDict or JSON Schema are used then a dictionary will be returned by the Runnable, and if Pydantic class is used then a Pydantic object will be returned.

As a example, let's get a model to generate a joke and separate the setup from the punchline:

In [2]:
import os
import yaml
from langchain_community.chat_models import ChatZhipuAI


with open('../utils/config.yml', 'r') as stream:
    zhipuai_api_key = yaml.safe_load(stream)['api_key']

os.environ['ZHIPUAI_API_KEY'] = zhipuai_api_key

chat_model = ChatZhipuAI(
    model='glm-4-plus',
    temperature=0,
)

## Pydantic class

If we want the model to return a Pydantic object, we just need to pass in the desired Pydantic class. The key advantage of using Pydantic is that the model-generated output wil be validated. Pydantic will raise an error if any required fields are missing or if any fields are of the wrong type.

In [3]:
from typing import Optional
from langchain_core.pydantic_v1 import BaseModel, Field

#Pydantic
class Joke(BaseModel):
    '''Joke to tell user.'''
    
    setup: str = Field(description='The setup of the joke.')
    punchline: str = Field(description='The punchline to the joke.')
    rating: Optional[int] = Field(
        default=None,
        description='How funny the joke is, from 1 to 10.'
    )
    
structured_llm = chat_model.with_structured_output(Joke)
structured_llm.invoke('Tell me a joke about cats')

Joke(setup="Why don't cats play poker in the jungle?", punchline='Too many cheetahs.', rating=None)

## TypedDict or JSON Schema

if you don't want to use Pydantic, explicitly don't want validation of the arguments, or want to be able to stream the model outputs, you can define your schema using a TypedDict class. We can optionally use a special `Annotated` syntax supported by LangChain that allows you to specify the default value and description of a field. the default value is *not* filled in automatically if the model doesn't generate it, it is only used in defining the schema that is passed to the model.

In [4]:
from typing_extensions import Annotated, TypedDict

# TypedDict
class Joke(TypedDict):
    ''' Joke tell to the user.'''
    
    setup: Annotated[str, ..., 'The setup of the joke.']
    
    ''' Alternatively, we could have specified setup as :
    
    setup: str                      no default, no description
    setup: Annotated[str,...]       no default, no description
    setup: Annotated[str,'foo']     default, no description
    '''
    punchline: Annotated[str, ..., 'The punchline to the joke.']
    rating: Annotated[Optional[int],None,'How funny the joke is, from 1 to 10.']
    
structured_llm = chat_model.with_structured_output(Joke)
structured_llm.invoke('Tell me a joke about cats')

{'setup': "Why don't cats play poker in the jungle?",
 'punchline': 'Too many cheetahs.'}

Equivalently, we can pass in a JSON Schema dict. This requires no imports or classes and makes it very clear exactly how each parameter is documented, at the cost of being a bit more verbose.

In [5]:
json_schmea = {
    'title': 'joke',
    'description': 'A joke to tell user.',
    'type': 'object',
    'properties': {
        'setup':{
            'type': 'string',
            'description': 'The setup of the joke.',
        },
        'punchline':{
            'type': 'string',
            'description': 'The punchline to the joke.',
        },
        'rating':{
            'type': 'integer',
            'description': 'How funny the joke is, from 1 to 10.',
            'default': None,
        },
    },
    'required': ['setup', 'punchline','rating'],
}

structured_llm = chat_model.with_structured_output(json_schmea)
structured_llm.invoke('Tell me a joke about cats')

{'setup': "Why don't cats play poker in the jungle?",
 'punchline': 'Too many cheetahs.',
 'rating': 7}

## Choosing between multiple schemas

The simplest way to let the model choose from multiple schemas is to create a parent schema that has a Union-typed attribute:

In [6]:
from typing import Union

# Pydantic
class Joke(BaseModel):
    setup: str = Field(description='The setup of the joke.')
    punchline: str = Field(description='The punchline to the joke.')
    rating: Optional[int] = Field(
        default=None, description='How funny the joke is, from 1 to 10.'
    )
    
class ConversationResponse(BaseModel):
    '''Respond in a conversational manner. Be kind and helpful.'''
    
    response: str = Field(description="A conversational response to the user's query")
    
class Response(BaseModel):
    output: Union[Joke, ConversationResponse]
    
structured_llm = chat_model.with_structured_output(Response)
structured_llm.invoke('Tell me a joke about cats')

Response(output=Joke(setup="Why don't cats play poker in the jungle?", punchline='Too many cheetahs.', rating=None))

In [7]:
structured_llm.invoke('How are you?')

Response(output=ConversationResponse(response="I'm just a computer program, so I don't have feelings, but I'm ready and functioning properly! How can I assist you today?"))

## Streaming

We can stream outputs from our structured model when the output type is a dict(i.e.,when the schema is specified as TypedDict class or JSON Schema dict).

In [12]:
from typing_extensions import Annotated, TypedDict

# TypedDict
class Joke(TypedDict):
    """Joke to tell user."""
    
    setup: Annotated[str, ..., 'The setup of the joke.']
    punchline: Annotated[str, ..., 'The punchline to the joke.']
    rating: Annotated[Optional[int],None,'How funny the joke is, from 1 to 10.']
    
structured_llm = chat_model.with_structured_output(Joke)
for chunk in structured_llm.stream('Tell me a joke about dogs'):
    print(chunk)

## Few-shot prompting

For more complex schema it's very useful to add few-shot examples to the prompt. This can be done in a few ways. The simplest and most universal way is to add examples to a system message in the prompt:

In [13]:
from langchain_core.prompts import ChatPromptTemplate

system = """You are a hilarious comedian. Your specialty is knock-knock jokes. \
Return a joke which has the setup (the response to "Who's there?") and the final punchline (the response to "<setup> who?").

Here are some examples of jokes:

example_user: Tell me a joke about planes
example_assistant: {{"setup": "Why don't planes ever get tired?", "punchline": "Because they have rest wings!", "rating": 2}}

example_user: Tell me another joke about planes
example_assistant: {{"setup": "Cargo", "punchline": "Cargo 'vroom vroom', but planes go 'zoom zoom'!", "rating": 10}}

example_user: Now about caterpillars
example_assistant: {{"setup": "Caterpillar", "punchline": "Caterpillar really slow, but watch me turn into a butterfly and steal the show!", "rating": 5}}"""

prompt = ChatPromptTemplate.from_messages([("system", system), ("human", "{input}")])

few_shot_structured_llm = prompt | structured_llm
few_shot_structured_llm.invoke("what's something funny about woodpeckers")

{'setup': 'Woodpecker',
 'punchline': "Woodpecker knock-knock on trees, but they never get a 'who's there?'!",
 'rating': 7}