# Structured Output with langchain-fused-model

This notebook demonstrates how to use Pydantic models to get structured, validated outputs from any ChatModel, with automatic fallback for models that don't support native structured output.

## Why Structured Output?

Structured output allows you to:
- Get validated, type-safe responses
- Parse complex data reliably
- Integrate LLM outputs directly into your application
- Avoid manual parsing and validation

## Setup

In [None]:
import os
from typing import List, Optional
from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_fused_model import MultiModelManager, ModelConfig, RoutingStrategy

# Set your API keys
# os.environ["OPENAI_API_KEY"] = "your-openai-key"
# os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

# Create models
models = [
    ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7),
    ChatAnthropic(model="claude-3-sonnet-20240229", temperature=0.7),
]

manager = MultiModelManager(
    models=models,
    strategy=RoutingStrategy.PRIORITY
)

print("Setup complete!")

## Example 1: Simple Person Schema

Let's start with a simple schema for extracting person information:

In [None]:
class Person(BaseModel):
    """Information about a person."""
    name: str = Field(description="The person's full name")
    age: int = Field(description="The person's age in years")
    occupation: str = Field(description="The person's job or profession")
    nationality: Optional[str] = Field(default=None, description="The person's nationality")

# Create structured output runnable
person_extractor = manager.with_structured_output(Person)

# Extract person information
result = person_extractor.invoke(
    "Tell me about Marie Curie, the famous physicist who won Nobel Prizes"
)

print("\n=== Extracted Person Information ===")
print(f"Name: {result.name}")
print(f"Age: {result.age}")
print(f"Occupation: {result.occupation}")
print(f"Nationality: {result.nationality}")
print(f"\nType: {type(result)}")

## Example 2: Complex Nested Schema

Let's try a more complex schema with nested objects:

In [None]:
class Address(BaseModel):
    """A physical address."""
    street: str
    city: str
    country: str
    postal_code: Optional[str] = None

class Company(BaseModel):
    """Information about a company."""
    name: str = Field(description="Company name")
    founded_year: int = Field(description="Year the company was founded")
    headquarters: Address = Field(description="Company headquarters location")
    industry: str = Field(description="Primary industry")
    employee_count: Optional[int] = Field(default=None, description="Number of employees")

# Create structured output runnable
company_extractor = manager.with_structured_output(Company)

# Extract company information
result = company_extractor.invoke(
    "Tell me about Apple Inc., the technology company founded in 1976 in Cupertino, California, USA"
)

print("\n=== Extracted Company Information ===")
print(f"Name: {result.name}")
print(f"Founded: {result.founded_year}")
print(f"Industry: {result.industry}")
print(f"Headquarters: {result.headquarters.city}, {result.headquarters.country}")
if result.employee_count:
    print(f"Employees: {result.employee_count:,}")

## Example 3: List of Objects

Extract multiple items in a structured format:

In [None]:
class Book(BaseModel):
    """Information about a book."""
    title: str
    author: str
    year: int
    genre: str

class BookList(BaseModel):
    """A list of books."""
    books: List[Book] = Field(description="List of books")

# Create structured output runnable
book_extractor = manager.with_structured_output(BookList)

# Extract book information
result = book_extractor.invoke(
    """List 3 famous science fiction books:
    1. Dune by Frank Herbert (1965)
    2. Foundation by Isaac Asimov (1951)
    3. Neuromancer by William Gibson (1984)
    """
)

print("\n=== Extracted Book List ===")
for i, book in enumerate(result.books, 1):
    print(f"\n{i}. {book.title}")
    print(f"   Author: {book.author}")
    print(f"   Year: {book.year}")
    print(f"   Genre: {book.genre}")

## Example 4: Data Extraction from Text

Extract structured data from unstructured text:

In [None]:
class Event(BaseModel):
    """An event with date and description."""
    date: str = Field(description="Date of the event")
    event: str = Field(description="Description of what happened")
    location: Optional[str] = Field(default=None, description="Where it happened")

class Timeline(BaseModel):
    """A timeline of events."""
    events: List[Event] = Field(description="List of events in chronological order")

# Create structured output runnable
timeline_extractor = manager.with_structured_output(Timeline)

# Extract timeline from text
text = """
The Apollo 11 mission was a historic spaceflight. On July 16, 1969, the mission launched from 
Kennedy Space Center in Florida. Four days later, on July 20, 1969, Neil Armstrong and Buzz Aldrin 
landed on the Moon at the Sea of Tranquility. Armstrong became the first person to walk on the Moon. 
The crew returned to Earth on July 24, 1969, splashing down in the Pacific Ocean.
"""

result = timeline_extractor.invoke(f"Extract the timeline of events from this text: {text}")

print("\n=== Extracted Timeline ===")
for event in result.events:
    location_str = f" ({event.location})" if event.location else ""
    print(f"\n{event.date}{location_str}")
    print(f"  {event.event}")

## Example 5: Classification Task

Use structured output for classification:

In [None]:
from enum import Enum

class Sentiment(str, Enum):
    """Sentiment classification."""
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class SentimentAnalysis(BaseModel):
    """Sentiment analysis result."""
    text: str = Field(description="The analyzed text")
    sentiment: Sentiment = Field(description="The detected sentiment")
    confidence: float = Field(description="Confidence score between 0 and 1", ge=0, le=1)
    reasoning: str = Field(description="Brief explanation of the classification")

# Create structured output runnable
sentiment_analyzer = manager.with_structured_output(SentimentAnalysis)

# Analyze sentiments
texts = [
    "I absolutely love this product! It exceeded all my expectations.",
    "The service was terrible and the staff was rude.",
    "It's okay, nothing special but not bad either."
]

print("\n=== Sentiment Analysis ===")
for text in texts:
    result = sentiment_analyzer.invoke(f"Analyze the sentiment of this text: {text}")
    print(f"\nText: {result.text[:50]}...")
    print(f"Sentiment: {result.sentiment.value.upper()}")
    print(f"Confidence: {result.confidence:.2f}")
    print(f"Reasoning: {result.reasoning}")

## Example 6: Validation and Error Handling

Pydantic automatically validates the data:

In [None]:
class Product(BaseModel):
    """Product information with validation."""
    name: str = Field(min_length=1, max_length=100)
    price: float = Field(gt=0, description="Price must be positive")
    quantity: int = Field(ge=0, description="Quantity must be non-negative")
    in_stock: bool = Field(description="Whether the product is in stock")

# Create structured output runnable
product_extractor = manager.with_structured_output(Product)

# Extract product information
try:
    result = product_extractor.invoke(
        "Extract product info: Laptop Pro, priced at $1299.99, 15 units available, currently in stock"
    )
    
    print("\n=== Extracted Product Information ===")
    print(f"Name: {result.name}")
    print(f"Price: ${result.price:.2f}")
    print(f"Quantity: {result.quantity}")
    print(f"In Stock: {result.in_stock}")
    
except Exception as e:
    print(f"\nValidation Error: {e}")

## How It Works

The `with_structured_output` method:

1. **Detects Native Support**: Checks if the model has native structured output (like OpenAI's function calling)
2. **Uses Native When Available**: Delegates to the model's native method for better performance
3. **Falls Back Gracefully**: If native support isn't available:
   - Injects JSON schema instructions into the prompt
   - Extracts JSON from the response using regex
   - Validates against your Pydantic schema
4. **Handles Errors**: Provides clear error messages if parsing or validation fails

## Best Practices

1. **Use Descriptive Field Descriptions**: Help the model understand what you want
2. **Add Validation**: Use Pydantic's validators (min_length, ge, le, etc.)
3. **Make Optional Fields Optional**: Use `Optional[T]` for fields that might not be present
4. **Use Enums for Categories**: Constrain choices with Enum types
5. **Test with Different Models**: Some models handle structured output better than others
6. **Provide Clear Prompts**: Be specific about what information you want extracted

## Conclusion

This notebook demonstrated:
- Simple and complex Pydantic schemas
- Nested objects and lists
- Data extraction from unstructured text
- Classification tasks
- Validation and error handling
- Best practices for structured output

Structured output makes it easy to integrate LLM responses directly into your application with type safety and validation!