# Returning structured output from LLM

## Method 1: Using Pydantic Models in LangChain

- Pydantic is a Python library for data validation and parsing using type annotations. LangChain integrates this capability with LLMs to ensure structured and validated outputs.
- It allows LLMs to generate responses that comply with a predefined schema, ensuring outputs are well-formed and adhere to type and field constraints.

Why use it?
- Ensures data reliability and reduces errors.
- Simplifies the process of integrating LLM-generated data into downstream applications.
- Enforces strict adherence to expected formats, reducing manual validation.

1. Define a schema using Pydantic's BaseModel

In [6]:
from dotenv import load_dotenv
import os
from langchain_openai import ChatOpenAI
load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")
llm = ChatOpenAI(model = 'gpt-4o-mini')
# Pydantic class - model generated output will be validated
from typing import Optional
from pydantic import BaseModel, Field
class CountryDetails(BaseModel):
    """fetch details of country"""
    Country: str = Field(description="Name of the Country")
    Last_Known_Population: int = Field(description="Population of the country")
    Capital: str = Field(description = 'Capital of the country')
    President: Optional[str] = Field(default = None, description = "President of the country")

2. Pass the schema to with_structured_output in LangChain


In [2]:
structured_llm = llm.with_structured_output(CountryDetails)
structured_llm.invoke("Tell me about any country")

CountryDetails(Country='Japan', Last_Known_Population=126476461, Capital='Tokyo', President='Fumio Kishida')

## Method 2: Using JSON Schema in LangChain

- JSON Schema is a standard format for defining the structure and validation rules for JSON objects. LangChain can use this schema to validate LLM outputs.

- It enforces the generation of responses in a specific format, defined by the JSON schema, making the output predictable and usable directly.

Why use it?

- Offers more flexibility compared to Pydantic for defining schemas.
- Easy to share and integrate with non-Python systems.
- Simplifies compliance with pre-existing standards for data representation.

1. Define a JSON schema specifying the expected fields and their properties.

In [3]:
json_schema = {
    "title": "CountryDetails",
    "description": "Get country information",
    "type": "object",
    "properties":{
        "Name":{
            "type": "string",
            "description": "The name of the country"
        },
        "Population":{
            "type": "string",
            "description": "Last known population with year"
        },
        "Capital":{
            "type":"string",
            "description": "Capital of the country"
        }
    },
    "required":["Name","Population","Capital"]
}

2. Use with_structured_output with the JSON schema

In [4]:
structured_llm = llm.with_structured_output(json_schema, include_raw=True)
structured_llm.invoke("Tell me about India")

{'raw': AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_WM4I7EunLjcLEx6NxatD8Dww', 'function': {'arguments': '{"Name":"India","Population":"1.41 billion (2021)","Capital":"New Delhi"}', 'name': 'CountryDetails'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 80, 'total_tokens': 102, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0aa8d3e20b', 'finish_reason': 'stop', 'logprobs': None}, id='run-5085363f-e6be-4c3b-9b62-a15fd56164e1-0', tool_calls=[{'name': 'CountryDetails', 'args': {'Name': 'India', 'Population': '1.41 billion (2021)', 'Capital': 'New Delhi'}, 'id': 'call_WM4I7EunLjcLEx6NxatD8Dww', 'type': 'tool_call'}], usage_metadata={'input_tokens': 80, 'output_tokens': 22, 

# ModelSmith

- Modelsmith is a library designed to integrate LLMs with Python's Pydantic validation, enabling structured outputs while supporting multiple data types and models.
- It generates structured outputs validated against Pydantic schemas and allows combining Python-native types like lists or complex objects in responses.
- Simplifies schema definition and response validation with high-level APIs.
- Supports multiple model types (e.g., OpenAI, Vertex AI).
- Enables structured outputs for more complex data formats.


In [None]:
!pip install modelsmith

In [7]:
from modelsmith import Forge, OpenAIModel
from pydantic import BaseModel, Field

1. Define a schema using Pydantic's BaseModel

In [8]:
# pydantic model you  want to recieve as the response
class User(BaseModel):
    name: str = Field(description='The person\'s name')
    age: int = Field(description='The person\'s age')
    city: str = Field(description='The city where the person lives')
    country: str = Field(description="The country where the person lives")

2. Create a Forge instance with the LLM and the schema and invoke it

In [9]:
# forge instance
forge = Forge(model=OpenAIModel('gpt-4o'), response_model = User)
user = forge.generate("Kauser tp 23. Lives in Bengaluru, India")
print(user)

name='Kauser' age=23 city='Bengaluru' country='India'


Collecting instructor
  Downloading instructor-1.7.2-py3-none-any.whl.metadata (18 kB)
Collecting groq
  Downloading groq-0.14.0-py3-none-any.whl.metadata (14 kB)
Collecting tenacity<10.0.0,>=9.0.0 (from instructor)
  Using cached tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Downloading instructor-1.7.2-py3-none-any.whl (71 kB)
Downloading groq-0.14.0-py3-none-any.whl (109 kB)
Using cached tenacity-9.0.0-py3-none-any.whl (28 kB)
Installing collected packages: tenacity, groq, instructor
  Attempting uninstall: tenacity
    Found existing installation: tenacity 8.5.0
    Uninstalling tenacity-8.5.0:
      Successfully uninstalled tenacity-8.5.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
modelsmith 0.6.1 requires tenacity<9.0.0,>=8.2.3, but you have tenacity 9.0.0 which is incompatible.
chromadb 0.5.23 requires tokenizers<=0.20.3,>=0.13.2, but you h

# LlamaIndex

In [None]:
!pip install llama-index-llms-ollama

In [None]:
!pip install llama-index-vector-stores-chroma

In [1]:
from pydantic import BaseModel
from enum import Enum
from typing import List

class Sentiment(str, Enum):
    positive = "positive"
    negative = "negative"
    neutral = "neutral"

class AspectSentiment(BaseModel):
    aspect: str
    sentiment: Sentiment

class ReviewAnalysis(BaseModel):
    review_text: str
    aspects: List[AspectSentiment]

In [2]:
from llama_index.llms.ollama import Ollama
from llama_index.core.program import LLMTextCompletionProgram

llm = Ollama(model="llama2", request_timeout=120.0)

In [None]:
prompt_template_str = """\
You are an expert in sentiment analysis. Analyze the following product review and extract key aspects mentioned along with the sentiment (positive, negative, or neutral) associated with each aspect.

Review: "{review_text}"

Provide the results in JSON format, matching the structure of the AspectSentiment model:
- aspect: the product aspect mentioned
- sentiment: the sentiment towards that aspect
"""
program = LLMTextCompletionProgram.from_defaults(
    llm=llm,
    output_cls=ReviewAnalysis,
    prompt_template_str=prompt_template_str,
    verbose=True,
)
review_text = "The camera quality of this phone is fantastic, but the battery life is disappointing."

output = program(review_text=review_text)

ConnectError: [Errno 111] Connection refused