# Structured Responses with Pydantic and Ollama

*Using IBM Granite Models with Local Ollama*

**Author:** Vipul Mahajan  
**Email:** vipmaha1@in.ibm.com

This recipe demonstrates how to generate reliable, structured responses from IBM Granite models using Pydantic for schema definition and validation, with **local Ollama** as the LLM provider. Unlike free-form text generation, structured responses ensure consistent, machine-readable outputs that integrate seamlessly with software systems.

## üéØ What Makes This Recipe Unique

This notebook complements the [Entity Extraction recipe](../Entity-Extraction/entity_extraction.ipynb) by focusing on **local, offline deployment** with Ollama instead of cloud APIs (Replicate/Watsonx). It's designed for users who need:

- **Privacy & Security**: Keep data local, no cloud API calls
- **Offline Capability**: Work without internet connectivity
- **Cost Control**: Eliminate per-request API costs
- **Production Patterns**: Advanced error handling, debugging, and monitoring

While the Entity Extraction recipe is excellent for quick cloud-based extraction, this notebook targets **production-ready, local inference systems** with enterprise requirements.

## What This Notebook Covers:

### üîß **Core Implementation**
- **Two Approaches**: Both LangChain's `with_structured_output` (clean, simple) AND manual parsing (fine-grained control)
- **Pydantic Schema Design**: Define structured data models with validation rules
- **Ollama Integration**: Connect to local Granite models for private, offline processing
- **Error Handling**: Robust parsing with multiple strategies and automatic retries
- **Type Safety**: Full TypeScript-like validation for Python data structures

### üìä **Practical Examples**
- **Product Review Analysis**: Extract ratings, sentiment, pros/cons from customer reviews
- **Research Paper Parsing**: Complex nested extraction from academic papers with author details
- **Advanced Validation**: Custom validators, computed fields, and consistency checks
- **Error Scenario Testing**: Handle ambiguous, contradictory, and malformed inputs

### üß™ **Testing & Validation**
- **Reliability Testing**: Multi-run consistency analysis with performance metrics
- **Error Handling Demo**: Graceful failure scenarios and recovery mechanisms  
- **Performance Comparison**: Pydantic vs unstructured approaches with detailed analysis
- **Debug Utilities**: Step-by-step troubleshooting tools for production use

### üìö **Production Guidance**
- **Best Practices**: Schema design, prompt engineering, and optimization strategies
- **Troubleshooting Guide**: Common issues and systematic resolution approaches
- **Method Comparison**: When to use `with_structured_output` vs manual parsing

## Key Benefits of This Approach:

1. **Type Safety**: Pydantic provides compile-time type checking and runtime validation
2. **Error Handling**: Robust parsing with automatic retries for malformed responses
3. **Reliability**: Consistent output format across multiple model runs
4. **Local Control**: Using Ollama eliminates API dependencies and ensures data privacy
5. **Validation**: Built-in field validation, constraints, and custom validators
6. **Flexibility**: Two methods (LangChain native vs custom) for different use cases

## Prerequisites

This recipe requires:
1. [Ollama](https://ollama.ai/) installed locally
2. IBM Granite model pulled in Ollama: `ollama pull ibm/granite4:3b` (or use `granite4:latest`)
3. Python packages: `pydantic`, `langchain-ollama`, `transformers`

## Install Dependencies

In [None]:
! echo "::group::Install Dependencies"
%pip install uv
! uv pip install "git+https://github.com/ibm-granite-community/utils.git" \
    pydantic \
    langchain-ollama \
    transformers \
    json5 \
    tenacity \
    jinja2
! echo "::endgroup::"

## Setup and Configuration

In [None]:
import json
import json5
import time
from typing import List, Optional, Dict, Any, Type, TypeVar
from pydantic import BaseModel, Field, ValidationError
from langchain_ollama import OllamaLLM, ChatOllama
from transformers import AutoTokenizer
from tenacity import retry, stop_after_attempt, wait_exponential
from ibm_granite_community.notebook_utils import wrap_text

# Setup Granite tokenizer for proper prompt formatting (used in manual approach)
model_path = "ibm-granite/granite-3.3-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)

ollama_model = "ibm/granite4:latest"

# Approach 1: OllamaLLM for manual structured response (fine-grained control)
llm_manual = OllamaLLM(
    model=ollama_model,
    temperature=0.1,  # Low temperature for more consistent outputs
    num_predict=1024,  # Max tokens to generate
    top_p=0.9,
    top_k=40
)

# Approach 2: ChatOllama for with_structured_output (LangChain native)
llm_chat = ChatOllama(
    model=ollama_model,
    temperature=0.1,
    num_predict=1024
)

print(f"‚úì Tokenizer loaded: {model_path}")
print(f"‚úì Ollama model configured: {ollama_model}")
print(f"‚úì Two LLM instances ready:")
print(f"  - OllamaLLM (manual approach with custom parsing)")
print(f"  - ChatOllama (with_structured_output method)")

## Basic Pydantic Schema Definition

Let's start with simple schema definitions to understand how Pydantic models work:

In [None]:
from pydantic import BaseModel, Field, field_validator
from typing import List, Optional
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class ProductReview(BaseModel):
    """Schema for extracting structured information from product reviews."""
    
    product_name: str = Field(description="Name of the product being reviewed")
    rating: float = Field(ge=1, le=5, description="Rating from 1 to 5 stars (can include decimals like 4.5)")
    sentiment: Sentiment = Field(description="Overall sentiment of the review")
    pros: List[str] = Field(description="Positive aspects mentioned in the review")
    cons: List[str] = Field(description="Negative aspects or complaints")
    would_recommend: bool = Field(description="Whether the reviewer recommends the product")
    review_summary: str = Field(description="Brief summary of the review in 1-2 sentences")
    
    @field_validator('pros', 'cons')
    @classmethod
    def validate_lists_not_empty_strings(cls, v):
        """Ensure list items are not empty strings"""
        return [item.strip() for item in v if item.strip()]

# Demonstrate schema generation
print("Generated JSON Schema:")
print(json.dumps(ProductReview.model_json_schema(), indent=2))

## Core Structured Response Function

This function handles the complete workflow: prompt formatting, model invocation, parsing, and validation with error handling:

In [None]:
T = TypeVar('T', bound=BaseModel)

def create_structured_prompt(user_message: str, schema_model: Type[BaseModel], system_prompt: Optional[str] = None) -> str:
    """Create a properly formatted prompt for structured response generation."""
    
    if system_prompt is None:
        system_prompt = f"""You are a precise data extraction assistant. Your task is to analyze the given text and extract information according to the specified schema.

CRITICAL INSTRUCTIONS:
- Respond ONLY with valid JSON that matches the schema exactly
- Do not include any explanations, comments, or additional text
- If information is not available, use null for optional fields or reasonable defaults
- Ensure all required fields are present
- Follow the data types specified in the schema strictly"""
    
    schema_json = json.dumps(schema_model.model_json_schema(), indent=2)
    
    full_prompt = f"""System: {system_prompt}

Schema to follow:
{schema_json}

Text to analyze:
{user_message}

Respond with valid JSON only:"""
    
    # Format using Granite chat template
    conversation = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Extract structured data from this text according to the schema:\n\n{schema_json}\n\nText: {user_message}"}
    ]
    
    formatted_prompt = tokenizer.apply_chat_template(
        conversation,
        add_generation_prompt=True,
        tokenize=False
    )
    
    return formatted_prompt

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def get_structured_response(user_message: str, schema_model: Type[T], system_prompt: Optional[str] = None) -> T:
    """Get a structured response with automatic retry on parsing failures (Manual Approach)."""
    
    prompt = create_structured_prompt(user_message, schema_model, system_prompt)
    
    # Get response from Ollama using manual LLM
    response = llm_manual.invoke(prompt)
    
    # Try multiple parsing strategies
    parsing_errors = []
    
    # Strategy 1: Direct JSON parsing
    try:
        parsed_json = json.loads(response.strip())
        return schema_model.model_validate(parsed_json)
    except (json.JSONDecodeError, ValidationError) as e:
        parsing_errors.append(f"Direct JSON parsing: {str(e)}")
    
    # Strategy 2: Extract JSON from response (handle extra text)
    try:
        # Find JSON object in response
        start_idx = response.find('{')
        end_idx = response.rfind('}')
        if start_idx != -1 and end_idx != -1:
            json_str = response[start_idx:end_idx+1]
            parsed_json = json.loads(json_str)
            return schema_model.model_validate(parsed_json)
    except (json.JSONDecodeError, ValidationError) as e:
        parsing_errors.append(f"Extracted JSON parsing: {str(e)}")
    
    # Strategy 3: Use json5 for more lenient parsing
    try:
        parsed_json = json5.loads(response.strip())
        return schema_model.model_validate(parsed_json)
    except (Exception) as e:
        parsing_errors.append(f"JSON5 parsing: {str(e)}")
    
    # If all strategies fail, raise detailed error
    error_msg = f"Failed to parse response after multiple attempts.\nResponse: {response[:200]}...\nErrors: {'; '.join(parsing_errors)}"
    raise ValueError(error_msg)

print("‚úì Manual structured response functions defined")

## Debug and Troubleshooting Utilities (OPTIONAL)

For debugging issues with structured responses, these utility functions provide step-by-step error tracking:

In [None]:
def debug_structured_response(user_message: str, schema_model: Type[T], system_prompt: Optional[str] = None, show_prompt: bool = False) -> T:
    """
    Debug version of get_structured_response with detailed step-by-step logging.
    
    Args:
        user_message: Text to extract structured data from
        schema_model: Pydantic model class to validate against
        system_prompt: Optional custom system prompt
        show_prompt: Whether to display the full prompt sent to model
    
    Returns:
        Validated Pydantic model instance
        
    Example usage:
        debug_result = debug_structured_response("Great product! 5 stars!", ProductReview)
    """
    
    print("üêõ DEBUG MODE - Step-by-step execution:")
    print("=" * 50)
    
    try:
        # Step 1: Create prompt
        print("1Ô∏è‚É£ Creating structured prompt...")
        prompt = create_structured_prompt(user_message, schema_model, system_prompt)
        print("   ‚úÖ Prompt created successfully")
        
        if show_prompt:
            print(f"   üìù Full prompt preview:\n{prompt[:500]}...")
        
        # Step 2: Get response from model
        print("\n2Ô∏è‚É£ Calling Ollama model...")
        response = llm.invoke(prompt)
        print(f"   ‚úÖ Got response ({len(response)} characters)")
        print(f"   üìÑ Response preview: {response[:150]}...")
        
        # Step 3: Parse JSON
        print("\n3Ô∏è‚É£ Parsing JSON response...")
        try:
            parsed_json = json.loads(response.strip())
            print("   ‚úÖ JSON parsing successful")
            print(f"   üìã Parsed fields: {list(parsed_json.keys())}")
        except json.JSONDecodeError as e:
            print(f"   ‚ö†Ô∏è  Direct JSON parsing failed: {e}")
            print("   üîÑ Trying JSON extraction...")
            
            start_idx = response.find('{')
            end_idx = response.rfind('}')
            if start_idx != -1 and end_idx != -1:
                json_str = response[start_idx:end_idx+1]
                parsed_json = json.loads(json_str)
                print("   ‚úÖ JSON extraction successful")
            else:
                raise ValueError("Could not find JSON object in response")
        
        # Step 4: Validate with Pydantic
        print("\n4Ô∏è‚É£ Validating with Pydantic schema...")
        result = schema_model.model_validate(parsed_json)
        print("   ‚úÖ Pydantic validation successful")
        print(f"   üéØ Model type: {type(result).__name__}")
        
        print(f"\nüéâ DEBUG SUCCESSFUL! Extracted {len(parsed_json)} fields")
        return result
        
    except Exception as e:
        print(f"\n‚ùå DEBUG FAILED at step above")
        print(f"   Error type: {type(e).__name__}")
        print(f"   Error message: {str(e)}")
        
        if "ValidationError" in str(type(e)):
            print(f"   üí° Pydantic validation details:")
            print(f"      {str(e)}")
        
        print(f"\nüîç Raw response for manual inspection:")
        if 'response' in locals():
            print(f"'{response}'")
        
        import traceback
        print(f"\nüìç Full traceback:")
        traceback.print_exc()
        raise e

def quick_test_model_connection():
    """Quick test to verify Ollama model is responding correctly."""
    
    print("üîå Testing Ollama Model Connection...")
    print("=" * 40)
    
    try:
        test_response = llm.invoke("Respond with just the word 'SUCCESS' and nothing else.")
        print(f"‚úÖ Model Response: '{test_response.strip()}'")
        
        if "SUCCESS" in test_response.upper():
            print("üéâ Model connection is working perfectly!")
        else:
            print("‚ö†Ô∏è  Model responded but didn't follow instructions exactly")
            
        return True
        
    except Exception as e:
        print(f"‚ùå Model connection failed: {e}")
        print("üí° Troubleshooting tips:")
        print("   - Check if Ollama server is running: `ollama serve`")
        print("   - Verify model is available: `ollama list`")
        print(f"   - Try pulling the model: `ollama pull {ollama_model}`")
        return False

print("üõ†Ô∏è Debug utilities loaded:")
print("   ‚Ä¢ debug_structured_response() - Step-by-step debugging")
print("   ‚Ä¢ quick_test_model_connection() - Test Ollama connectivity")

## Example 1: Product Review Analysis

Let's test our structured response system with product review analysis:

In [None]:
# Sample product review text
review_text = """
I recently purchased the Sony WH-1000XM4 wireless headphones and I'm absolutely thrilled with them! 
The noise cancellation is phenomenal - I can completely block out airplane noise during flights. 
The sound quality is crisp and clear with excellent bass response. Battery life easily lasts 25+ hours. 
The touch controls are intuitive and responsive.

However, there are a few minor downsides. They can get a bit uncomfortable during very long listening 
sessions (4+ hours), and the carrying case is quite bulky. The price point is also pretty steep at $350.

Despite these small issues, I would definitely recommend these headphones to anyone looking for premium 
noise-cancelling headphones. The audio quality and ANC technology make them worth the investment.
Rating: 4.5/5 stars
"""

try:
    # Extract structured data from the review
    review_data = get_structured_response(review_text, ProductReview)
    
    print("‚úì Successfully extracted structured data!\n")
    print("Product Review Analysis:")
    print("=" * 40)
    print(f"Product: {review_data.product_name}")
    print(f"Rating: {review_data.rating}/5 stars")
    print(f"Sentiment: {review_data.sentiment.value}")
    print(f"Would Recommend: {'Yes' if review_data.would_recommend else 'No'}")
    print(f"\nPositives: {', '.join(review_data.pros)}")
    print(f"\nNegatives: {', '.join(review_data.cons)}")
    print(f"\nSummary: {review_data.review_summary}")
    
    # Show the raw JSON for verification
    print("\n" + "="*40)
    print("Raw JSON Output:")
    print(json.dumps(review_data.model_dump(), indent=2))
    
except Exception as e:
    print(f"‚ùå Error processing review: {str(e)}")
    print("\nüí° Tip: If you encounter errors, try the debug function below:")

In [None]:
# Create a structured LLM using with_structured_output
structured_llm = llm_chat.with_structured_output(ProductReview, method="json_schema")

# Use the same review text
try:
    # Direct invocation - much simpler than manual approach!
    review_data_langchain = structured_llm.invoke(review_text)
    
    print("‚úì Successfully extracted structured data using with_structured_output!\n")
    print("Product Review Analysis (LangChain Method):")
    print("=" * 40)
    print(f"Product: {review_data_langchain.product_name}")
    print(f"Rating: {review_data_langchain.rating}/5 stars")
    print(f"Sentiment: {review_data_langchain.sentiment.value}")
    print(f"Would Recommend: {'Yes' if review_data_langchain.would_recommend else 'No'}")
    print(f"\nPositives: {', '.join(review_data_langchain.pros)}")
    print(f"\nNegatives: {', '.join(review_data_langchain.cons)}")
    print(f"\nSummary: {review_data_langchain.review_summary}")
    
    # Show the raw JSON for verification
    print("\n" + "="*40)
    print("Raw JSON Output:")
    print(json.dumps(review_data_langchain.model_dump(), indent=2))
    
    print("\nüí° Notice: This approach is much cleaner - no manual JSON parsing needed!")
    
except Exception as e:
    print(f"‚ùå Error processing review: {str(e)}")

## Example 1b: Product Review with LangChain's with_structured_output

Now let's see the same task using LangChain's native `with_structured_output` method, which provides a cleaner interface with automatic JSON parsing and validation:

## Comparison: Manual vs with_structured_output

Let's compare the two approaches we've demonstrated:

In [None]:
print("üìä APPROACH COMPARISON")
print("=" * 80)
print()

comparison = {
    "Feature": ["Code Complexity", "LangChain Integration", "Error Handling", "Prompt Control", 
                "Retry Logic", "JSON Parsing", "Chat Template Support", "Best For"],
    "Manual Approach": [
        "More code (~50 lines)",
        "Uses OllamaLLM",
        "Custom multi-strategy parsing",
        "Full control with tokenizer",
        "Custom @retry decorator",
        "3 fallback strategies (json, extract, json5)",
        "‚úÖ Granite chat templates",
        "Custom workflows, fine-grained control"
    ],
    "with_structured_output": [
        "Minimal code (~3 lines)",
        "Uses ChatOllama",
        "Automatic by LangChain",
        "Standard LangChain prompting",
        "Built-in by LangChain",
        "Automatic JSON parsing",
        "‚ùå Standard chat format",
        "Quick development, LangChain pipelines"
    ]
}

# Print comparison table
print(f"{'Feature':<30} {'Manual Approach':<35} {'with_structured_output':<35}")
print("-" * 100)

for i in range(len(comparison["Feature"])):
    feature = comparison["Feature"][i]
    manual = comparison["Manual Approach"][i]
    langchain = comparison["with_structured_output"][i]
    print(f"{feature:<30} {manual:<35} {langchain:<35}")

print()
print("=" * 80)
print()
print("üéØ RECOMMENDATIONS:")
print()
print("Use Manual Approach when:")
print("  ‚Ä¢ You need fine-grained control over prompt formatting")
print("  ‚Ä¢ You want to use Granite-specific chat templates")
print("  ‚Ä¢ You need custom retry or parsing strategies")
print("  ‚Ä¢ You're building complex production workflows")
print()
print("Use with_structured_output when:")
print("  ‚Ä¢ You want cleaner, more maintainable code")
print("  ‚Ä¢ You're building standard LangChain pipelines")
print("  ‚Ä¢ You prefer LangChain's built-in error handling")
print("  ‚Ä¢ You're prototyping or need rapid development")
print()
print("‚ú® Both approaches are production-ready and reliable!")

## Example 2: Research Paper Information Extraction

Let's create a more complex nested schema for research paper analysis:

In [None]:
class Author(BaseModel):
    """Individual author information."""
    name: str = Field(description="Full name of the author")
    affiliation: Optional[str] = Field(None, description="Institution or organization")
    email: Optional[str] = Field(None, description="Email address if available")

class ResearchPaper(BaseModel):
    """Comprehensive research paper information extraction schema."""
    
    title: str = Field(description="Complete title of the research paper")
    authors: List[Author] = Field(description="List of all authors with their details")
    abstract: str = Field(description="Paper abstract or summary")
    keywords: List[str] = Field(description="Key terms and concepts from the paper")
    publication_year: Optional[int] = Field(None, ge=1900, le=2030, description="Year of publication")
    venue: Optional[str] = Field(None, description="Journal, conference, or publication venue")
    methodology: str = Field(description="Research methods and approach used")
    key_findings: List[str] = Field(description="Main findings and contributions")
    limitations: List[str] = Field(description="Study limitations mentioned by authors")
    future_work: Optional[str] = Field(None, description="Suggested future research directions")
    confidence_score: float = Field(ge=0.0, le=1.0, description="Confidence in extraction accuracy (0-1)")

# Sample research paper abstract
paper_text = """
Title: Deep Learning Approaches for Automated Medical Image Analysis: A Comprehensive Survey

Authors: 
Dr. Sarah Chen (Stanford University Medical Center, schen@stanford.edu)
Prof. Michael Rodriguez (MIT Computer Science Department)
Dr. Aisha Patel (Johns Hopkins Hospital)

Abstract:
This comprehensive survey examines the current state of deep learning applications in medical image analysis. 
We review over 200 recent publications focusing on convolutional neural networks (CNNs), vision transformers, 
and generative adversarial networks (GANs) for medical imaging tasks. Our analysis covers applications in 
radiology, pathology, dermatology, and ophthalmology.

Methodology:
We conducted a systematic literature review of papers published between 2020-2023, analyzing model architectures, 
datasets, evaluation metrics, and clinical validation approaches. Performance metrics were standardized across 
studies for comparative analysis.

Key Findings:
1. Vision transformers show 15% better accuracy than CNNs on chest X-ray analysis
2. Multi-modal approaches combining imaging and clinical data improve diagnostic accuracy by 12%
3. Domain adaptation techniques reduce the need for labeled data by up to 40%
4. Explainable AI methods increase clinician trust and adoption rates

Limitations:
- Limited diversity in datasets, with bias toward certain demographics
- Lack of standardized evaluation protocols across studies  
- Insufficient clinical validation in real-world settings
- Regulatory approval challenges for AI-based diagnostic tools

Future Work:
We identify the need for larger, more diverse datasets, standardized evaluation frameworks, and collaborative 
efforts between AI researchers and medical professionals to bridge the gap between research and clinical practice.

Keywords: deep learning, medical imaging, computer vision, healthcare AI, diagnostic automation
Published: Journal of Medical AI, 2023
"""

try:
    print("Extracting research paper information...")
    paper_data = get_structured_response(
        paper_text, 
        ResearchPaper,
        system_prompt="""You are an expert research paper analyzer. Extract all available information 
        from the given research paper text. Be thorough and accurate. If specific information is not 
        available, use null for optional fields. Assess your confidence in the extraction accuracy."""
    )
    
    print("‚úì Successfully extracted research paper data!\n")
    
    print("Research Paper Analysis:")
    print("=" * 50)
    print(f"Title: {paper_data.title}")
    print(f"Publication Year: {paper_data.publication_year}")
    print(f"Venue: {paper_data.venue}")
    print(f"Confidence Score: {paper_data.confidence_score:.2f}")
    
    print("\nAuthors:")
    for i, author in enumerate(paper_data.authors, 1):
        print(f"  {i}. {author.name}")
        if author.affiliation:
            print(f"     Affiliation: {author.affiliation}")
        if author.email:
            print(f"     Email: {author.email}")
    
    print(f"\nKeywords: {', '.join(paper_data.keywords)}")
    
    print(f"\nAbstract:\n{wrap_text(paper_data.abstract)}")
    
    print(f"\nMethodology:\n{wrap_text(paper_data.methodology)}")
    
    print("\nKey Findings:")
    for i, finding in enumerate(paper_data.key_findings, 1):
        print(f"  {i}. {finding}")
    
    print("\nLimitations:")
    for i, limitation in enumerate(paper_data.limitations, 1):
        print(f"  {i}. {limitation}")
    
    if paper_data.future_work:
        print(f"\nFuture Work:\n{wrap_text(paper_data.future_work)}")
    
except Exception as e:
    print(f"‚ùå Error processing research paper: {str(e)}")

## Reliability Testing and Performance Metrics

Let's test the consistency and reliability of our structured response system across multiple runs:

In [None]:
import time
from collections import defaultdict
from typing import Dict, List, Any

def test_consistency(text: str, schema_model: Type[BaseModel], num_runs: int = 5) -> Dict[str, Any]:
    """Test the consistency of structured responses across multiple runs."""
    
    results = []
    errors = []
    response_times = []
    
    print(f"Testing consistency across {num_runs} runs...")
    
    for i in range(num_runs):
        try:
            start_time = time.time()
            result = get_structured_response(text, schema_model)
            end_time = time.time()
            
            results.append(result.model_dump())
            response_times.append(end_time - start_time)
            print(f"  Run {i+1}: ‚úì Success ({end_time - start_time:.2f}s)")
            
        except Exception as e:
            errors.append(str(e))
            print(f"  Run {i+1}: ‚ùå Error - {str(e)[:100]}...")
    
    # Analyze consistency
    success_rate = len(results) / num_runs
    avg_response_time = sum(response_times) / len(response_times) if response_times else 0
    
    # Check field consistency
    field_consistency = {}
    if results:
        first_result = results[0]
        for field in first_result.keys():
            field_values = [result.get(field) for result in results]
            unique_values = len(set(str(v) for v in field_values))
            field_consistency[field] = 1.0 - (unique_values - 1) / len(field_values)
    
    return {
        'success_rate': success_rate,
        'avg_response_time': avg_response_time,
        'field_consistency': field_consistency,
        'total_runs': num_runs,
        'successful_runs': len(results),
        'errors': errors,
        'results': results
    }

# Test with a simple product review
test_review = """
The iPhone 14 is amazing! Great camera quality and battery life lasts all day. 
A bit expensive at $999 but worth it. The display is crisp and bright. 
Only complaint is it gets warm during heavy gaming. Would definitely recommend! 5/5 stars.
"""

metrics = test_consistency(test_review, ProductReview, num_runs=5)

print("\n" + "=" * 50)
print("RELIABILITY METRICS")
print("=" * 50)
print(f"Success Rate: {metrics['success_rate']:.1%} ({metrics['successful_runs']}/{metrics['total_runs']} runs)")
print(f"Average Response Time: {metrics['avg_response_time']:.2f} seconds")

if metrics['field_consistency']:
    print("\nField Consistency Scores:")
    for field, score in metrics['field_consistency'].items():
        print(f"  {field}: {score:.1%}")

if metrics['errors']:
    print(f"\nErrors encountered: {len(metrics['errors'])}")
    for i, error in enumerate(metrics['errors'][:3], 1):  # Show first 3 errors
        print(f"  {i}. {error[:100]}...")

## Error Handling Demonstration

Let's demonstrate how our system handles various error scenarios and malformed responses:

In [None]:
def test_error_scenarios():
    """Test various error scenarios and recovery mechanisms."""
    
    scenarios = [
        {
            'name': 'Ambiguous Text',
            'text': 'This is some random text with no clear product information.',
            'expected': 'Should handle missing information gracefully'
        },
        {
            'name': 'Mixed Language',
            'text': 'Le iPhone es muy bueno, great camera, 4 √©toiles, price $800.',
            'expected': 'Should extract available information despite mixed languages'
        },
        {
            'name': 'Contradictory Information',
            'text': 'iPhone 15 is terrible, worst phone ever! Amazing camera though. 5 stars! Would not recommend.',
            'expected': 'Should handle contradictory statements'
        }
    ]
    
    print("Testing Error Handling Scenarios")
    print("=" * 40)
    
    for scenario in scenarios:
        print(f"\nüß™ Testing: {scenario['name']}")
        print(f"Expected: {scenario['expected']}")
        print(f"Input: {scenario['text'][:100]}...")
        
        try:
            result = get_structured_response(scenario['text'], ProductReview)
            print("‚úÖ Successfully extracted data:")
            print(f"   Product: {result.product_name}")
            print(f"   Rating: {result.rating}/5")
            print(f"   Sentiment: {result.sentiment}")
            print(f"   Recommend: {result.would_recommend}")
            
        except Exception as e:
            print(f"‚ùå Failed to extract: {str(e)[:150]}...")
    
test_error_scenarios()

## Advanced Pydantic Features

Let's explore more advanced Pydantic features like custom validators, computed fields, and conditional schemas:

In [None]:
def demonstrate_best_practices():
    """Demonstrate best practices for structured responses."""
    
    print("üéØ BEST PRACTICES FOR STRUCTURED RESPONSES")
    print("=" * 50)
    
    practices = [
        {
            'title': '1. Choose the Right Approach',
            'description': 'Select between with_structured_output and manual parsing based on needs',
            'example': '''
# with_structured_output - Quick and clean ‚úÖ
structured_llm = llm_chat.with_structured_output(MySchema)
result = structured_llm.invoke(text)

# Manual - Fine-grained control ‚úÖ
result = get_structured_response(text, MySchema)
'''
        },
        {
            'title': '2. Clear Schema Definitions',
            'description': 'Use descriptive field names and detailed Field descriptions',
            'example': '''
# Good ‚úÖ
class Review(BaseModel):
    rating: int = Field(ge=1, le=5, description="User rating from 1-5 stars")
    sentiment: str = Field(description="Overall sentiment: positive, negative, neutral")

# Avoid ‚ùå  
class Review(BaseModel):
    r: int  # Unclear field name
    s: str  # No description
'''
        },
        {
            'title': '3. Robust Error Handling',
            'description': 'Implement proper error handling for both approaches',
            'example': '''
# with_structured_output approach:
try:
    result = structured_llm.invoke(text)
except Exception as e:
    print(f"Extraction failed: {e}")
    # Fallback logic

# Manual approach has built-in multi-strategy parsing + @retry
'''
        },
        {
            'title': '4. Validation and Constraints',
            'description': 'Use Pydantic validators to ensure data quality',
            'example': '''
# Use constraints and custom validators:
@field_validator('email')
@classmethod
def validate_email(cls, v):
    if v and '@' not in v:
        raise ValueError('Invalid email format')
    return v
'''
        },
        {
            'title': '5. Performance Optimization',
            'description': 'Optimize for speed and reliability',
            'example': '''
# Performance tips:
- Use low temperature (0.1-0.2) for consistency
- Cache LLM instances (don't recreate each time)
- Use with_structured_output for simpler schemas
- Use manual approach when you need retry strategies
- Monitor response times and success rates
'''
        }
    ]
    
    for practice in practices:
        print(f"\n{practice['title']}")
        print(f"{practice['description']}")
        if practice['example']:
            print(practice['example'])
    
    print("\n" + "="*50)
    print("üîß COMMON TROUBLESHOOTING")
    print("="*50)
    
    issues = [
        {
            'problem': 'JSON Parsing Errors',
            'solutions': [
                'Try with_structured_output first - it handles parsing automatically',
                'For manual: Check for extra text before/after JSON',
                'Use json5 for more lenient parsing (manual approach includes this)',
                'Improve prompt clarity about JSON-only responses'
            ]
        },
        {
            'problem': 'Validation Errors',
            'solutions': [
                'Review field constraints (min/max values)',
                'Make optional fields truly optional with Optional[Type]',
                'Add custom validators for complex logic',
                'Provide clear examples in prompts'
            ]
        },
        {
            'problem': 'Inconsistent Results',
            'solutions': [
                'Lower temperature for more deterministic outputs',
                'Improve prompt specificity',
                'Use multiple validation runs to test consistency',
                'Consider using method="json_schema" with with_structured_output'
            ]
        },
        {
            'problem': 'Performance Issues', 
            'solutions': [
                'Use with_structured_output for faster development',
                'Optimize prompt length in manual approach',
                'Use appropriate Granite model size (3b vs 7b)',
                'Implement caching for repeated requests',
                'Consider batch processing'
            ]
        }
    ]
    
    for issue in issues:
        print(f"\n‚ùó {issue['problem']}:")
        for solution in issue['solutions']:
            print(f"   ‚Ä¢ {solution}")

demonstrate_best_practices()

## Conclusion

This notebook demonstrated the power of combining Pydantic validation with local Ollama inference for generating reliable, structured responses from IBM Granite models. We explored **two production-ready approaches** that serve different needs.

### Key Takeaways:

1. **Two Methods, Both Reliable**: 
   - **with_structured_output**: LangChain's native method for clean, minimal code
   - **Manual Parsing**: Custom approach with fine-grained control and multiple fallback strategies

2. **Type Safety**: Pydantic provides compile-time type checking and runtime validation
3. **Error Handling**: Both approaches offer robust error handling (automatic vs custom)
4. **Local Control**: Ollama provides privacy, offline capability, and eliminates API dependencies
5. **Flexibility**: Choose the method that fits your workflow and requirements
6. **Production Ready**: Advanced features like validators, computed fields, and retry logic

### What Makes This Notebook Unique:

Unlike the [Entity Extraction recipe](../Entity-Extraction/entity_extraction.ipynb) which focuses on cloud APIs (Replicate/Watsonx), this notebook is designed for:

- **Local Deployment**: Privacy-focused, offline-capable infrastructure
- **Production Patterns**: Enterprise-grade error handling, debugging, and monitoring
- **Advanced Features**: Custom validators, computed fields, multiple parsing strategies
- **Method Comparison**: Understand trade-offs between LangChain native vs custom approaches
- **Comprehensive Testing**: Reliability testing, performance benchmarking, best practices

### When to Use This Approach:

- ‚úÖ **Local/Offline Deployment**: Privacy requirements or air-gapped environments
- ‚úÖ **Data extraction from unstructured text** with validation
- ‚úÖ **API response standardization** with type safety
- ‚úÖ **Production systems needing reliable outputs** with error recovery
- ‚úÖ **Cost Control**: No per-request API charges

### Performance vs. Alternatives:

- **vs. Raw JSON**: Higher reliability, better error handling, type safety
- **vs. Cloud APIs**: Privacy, cost control, offline capability
- **vs. Entity Extraction Recipe**: Local deployment, production patterns, advanced Pydantic features
- **Manual vs with_structured_output**: Control vs simplicity - both are excellent choices

### Method Selection Guide:

**Choose `with_structured_output` when:**
- Building standard LangChain pipelines
- Prioritizing clean, maintainable code
- Prototyping or rapid development
- Working with straightforward schemas

**Choose Manual Approach when:**
- Need Granite-specific chat templates
- Require custom retry or parsing strategies
- Building complex production workflows
- Need fine-grained prompt control

### Next Steps:

1. **Scale Up**: Implement batch processing for large datasets
2. **Monitoring**: Add logging and metrics collection
3. **Custom Models**: Fine-tune Granite models for specific domains
4. **Integration**: Build into production pipelines
5. **Hybrid Approach**: Use with_structured_output for prototyping, manual for production edge cases

The combination of Pydantic's robust validation with Granite's language understanding and Ollama's local deployment creates a powerful foundation for reliable structured data extraction in production systems, whether you choose the LangChain-native or custom approach.

In [None]:
def get_unstructured_response(text: str, fields: List[str]) -> Dict[str, Any]:
    """Get response without Pydantic validation for comparison."""
    
    prompt = f"""Extract the following information from the text and return as JSON:
    Fields to extract: {', '.join(fields)}
    
    Text: {text}
    
    Return valid JSON only:"""
    
    response = llm.invoke(prompt)
    
    # Simple JSON parsing without validation
    try:
        return json.loads(response.strip())
    except json.JSONDecodeError:
        # Try to extract JSON
        start_idx = response.find('{')
        end_idx = response.rfind('}')
        if start_idx != -1 and end_idx != -1:
            return json.loads(response[start_idx:end_idx+1])
        raise ValueError("Could not parse JSON response")

def compare_approaches(text: str, num_runs: int = 5):
    """Compare Pydantic-validated vs unstructured approaches."""
    
    fields = ['product_name', 'rating', 'sentiment', 'pros', 'cons', 'would_recommend']
    
    print(f"Comparing approaches over {num_runs} runs...\n")
    
    # Test with Pydantic validation
    pydantic_success = 0
    pydantic_times = []
    pydantic_valid_data = 0
    
    print("üîç Testing WITH Pydantic Validation:")
    for i in range(num_runs):
        try:
            start_time = time.time()
            result = get_structured_response(text, ProductReview)
            end_time = time.time()
            
            pydantic_success += 1
            pydantic_times.append(end_time - start_time)
            
            # Check data validity
            if (1 <= result.rating <= 5 and 
                isinstance(result.pros, list) and 
                isinstance(result.cons, list) and
                isinstance(result.would_recommend, bool)):
                pydantic_valid_data += 1
            
            print(f"  Run {i+1}: ‚úì Success")
            
        except Exception as e:
            print(f"  Run {i+1}: ‚ùå Failed")
    
    # Test without Pydantic validation
    unstructured_success = 0
    unstructured_times = []
    unstructured_valid_data = 0
    
    print("\nüîç Testing WITHOUT Pydantic Validation:")
    for i in range(num_runs):
        try:
            start_time = time.time()
            result = get_unstructured_response(text, fields)
            end_time = time.time()
            
            unstructured_success += 1
            unstructured_times.append(end_time - start_time)
            
            # Check data validity (more lenient)
            rating = result.get('rating', 0)
            pros = result.get('pros', [])
            cons = result.get('cons', [])
            recommend = result.get('would_recommend', False)
            
            if (isinstance(rating, (int, float)) and 1 <= rating <= 5 and
                isinstance(pros, list) and isinstance(cons, list) and
                isinstance(recommend, bool)):
                unstructured_valid_data += 1
            
            print(f"  Run {i+1}: ‚úì Success")
            
        except Exception as e:
            print(f"  Run {i+1}: ‚ùå Failed")
    
    # Results comparison
    print("\n" + "=" * 60)
    print("üìä COMPARISON RESULTS")
    print("=" * 60)
    
    pydantic_avg_time = sum(pydantic_times) / len(pydantic_times) if pydantic_times else 0
    unstructured_avg_time = sum(unstructured_times) / len(unstructured_times) if unstructured_times else 0
    
    print(f"\n{'Metric':<25} {'Pydantic':<15} {'Unstructured':<15} {'Winner':<10}")
    print("-" * 65)
    
    # Success Rate
    pydantic_success_rate = pydantic_success / num_runs
    unstructured_success_rate = unstructured_success / num_runs
    if pydantic_success_rate > unstructured_success_rate:
        success_winner = "Pydantic"
    elif unstructured_success_rate > pydantic_success_rate:
        success_winner = "Unstructured" 
    else:
        success_winner = "Tie"
    pydantic_success_pct = f"{pydantic_success_rate:.1%}"
    unstructured_success_pct = f"{unstructured_success_rate:.1%}"
    print(f"{'Success Rate':<25} {pydantic_success_pct:<15} {unstructured_success_pct:<15} {success_winner:<10}")
    
    # Data Validity
    pydantic_validity_rate = pydantic_valid_data / max(pydantic_success, 1)
    unstructured_validity_rate = unstructured_valid_data / max(unstructured_success, 1)
    if pydantic_validity_rate > unstructured_validity_rate:
        validity_winner = "Pydantic"
    elif unstructured_validity_rate > pydantic_validity_rate:
        validity_winner = "Unstructured"
    else:
        validity_winner = "Tie"
    pydantic_validity_pct = f"{pydantic_validity_rate:.1%}"
    unstructured_validity_pct = f"{unstructured_validity_rate:.1%}"
    print(f"{'Data Validity Rate':<25} {pydantic_validity_pct:<15} {unstructured_validity_pct:<15} {validity_winner:<10}")
    
    # Response Time
    if pydantic_avg_time < unstructured_avg_time:
        time_winner = "Pydantic"
    elif unstructured_avg_time < pydantic_avg_time:
        time_winner = "Unstructured"
    else:
        time_winner = "Tie"
    pydantic_time_str = f"{pydantic_avg_time:.2f}s"
    unstructured_time_str = f"{unstructured_avg_time:.2f}s"
    print(f"{'Avg Response Time':<25} {pydantic_time_str:<15} {unstructured_time_str:<15} {time_winner:<10}")
    
    # Overall Reliability
    pydantic_reliability = pydantic_success_rate * pydantic_validity_rate
    unstructured_reliability = unstructured_success_rate * unstructured_validity_rate
    if pydantic_reliability > unstructured_reliability:
        reliability_winner = "Pydantic"
    elif unstructured_reliability > pydantic_reliability:
        reliability_winner = "Unstructured"
    else:
        reliability_winner = "Tie"
    pydantic_reliability_pct = f"{pydantic_reliability:.1%}"
    unstructured_reliability_pct = f"{unstructured_reliability:.1%}"
    print(f"{'Overall Reliability':<25} {pydantic_reliability_pct:<15} {unstructured_reliability_pct:<15} {reliability_winner:<10}")
    
    # Summary Analysis
    print(f"\nüéØ ANALYSIS:")
    if success_winner == "Tie" and validity_winner == "Tie":
        print("‚úÖ Both approaches achieved identical reliability for this simple, well-structured text")
        if unstructured_avg_time < pydantic_avg_time:
            time_diff = pydantic_avg_time - unstructured_avg_time
            percent_improvement = (time_diff / pydantic_avg_time) * 100
            print(f"‚ö° Unstructured is {time_diff:.1f}s faster ({percent_improvement:.1f}% improvement)")
        print("üí° Pydantic's benefits emerge with complex, ambiguous, or malformed text")
    else:
        print("üìä Performance varies - check individual metrics for trade-offs")

# Run comparison test
comparison_review = """
AirPods Pro 2nd gen are fantastic! Noise cancellation is top-notch, sound quality 
is crisp, and the spatial audio feature is immersive. Battery life is solid at 6 hours 
with ANC on. The transparency mode works perfectly for hearing surroundings.

Cons: Expensive at $249, case is a fingerprint magnet, and sometimes connectivity 
can be finicky with non-Apple devices. 

Overall: 4.5/5 stars, highly recommend for iPhone users!
"""

compare_approaches(comparison_review, num_runs=3)

## Best Practices and Troubleshooting

Here are key recommendations for successful structured response implementation:

In [None]:
def demonstrate_best_practices():
    """Demonstrate best practices for structured responses."""
    
    print("üéØ BEST PRACTICES FOR STRUCTURED RESPONSES")
    print("=" * 50)
    
    practices = [
        {
            'title': '1. Clear Schema Definitions',
            'description': 'Use descriptive field names and detailed Field descriptions',
            'example': '''
# Good ‚úÖ
class Review(BaseModel):
    rating: int = Field(ge=1, le=5, description="User rating from 1-5 stars")
    sentiment: str = Field(description="Overall sentiment: positive, negative, neutral")

# Avoid ‚ùå  
class Review(BaseModel):
    r: int  # Unclear field name
    s: str  # No description
'''
        },
        {
            'title': '2. Robust Error Handling',
            'description': 'Implement multiple parsing strategies and retry mechanisms',
            'example': '''
# Implement multiple parsing fallbacks:
1. Direct JSON parsing
2. Extract JSON from text
3. Use json5 for lenient parsing
4. Manual field extraction as last resort
'''
        },
        {
            'title': '3. Effective Prompting',
            'description': 'Use clear instructions and proper chat formatting',
            'example': '''
# Key prompt elements:
- Clear role definition
- Explicit JSON-only response requirement
- Schema inclusion in prompt
- Handling of missing data
- Use of proper chat templates
'''
        },
        {
            'title': '4. Validation and Constraints',
            'description': 'Use Pydantic validators to ensure data quality',
            'example': '''
# Use constraints and custom validators:
@validator('email')
def validate_email(cls, v):
    if v and '@' not in v:
        raise ValueError('Invalid email format')
    return v
'''
        },
        {
            'title': '5. Performance Optimization',
            'description': 'Optimize for speed and reliability',
            'example': '''
# Performance tips:
- Use low temperature for consistency
- Cache tokenizer instances
- Implement exponential backoff for retries
- Monitor response times and success rates
'''
        }
    ]
    
    for practice in practices:
        print(f"\n{practice['title']}")
        print(f"{practice['description']}")
        if practice['example']:
            print(practice['example'])
    
    print("\n" + "="*50)
    print("üîß COMMON TROUBLESHOOTING")
    print("="*50)
    
    issues = [
        {
            'problem': 'JSON Parsing Errors',
            'solutions': [
                'Check for extra text before/after JSON',
                'Use json5 for more lenient parsing',
                'Improve prompt clarity about JSON-only responses',
                'Add response cleaning preprocessing'
            ]
        },
        {
            'problem': 'Validation Errors',
            'solutions': [
                'Review field constraints (min/max values)',
                'Make optional fields truly optional',
                'Add custom validators for complex logic',
                'Provide clear examples in prompts'
            ]
        },
        {
            'problem': 'Inconsistent Results',
            'solutions': [
                'Lower temperature for more deterministic outputs',
                'Improve prompt specificity',
                'Use multiple validation runs',
                'Implement consensus mechanisms'
            ]
        },
        {
            'problem': 'Performance Issues', 
            'solutions': [
                'Optimize prompt length',
                'Use appropriate model size',
                'Implement caching for repeated requests',
                'Consider batch processing'
            ]
        }
    ]
    
    for issue in issues:
        print(f"\n‚ùó {issue['problem']}:")
        for solution in issue['solutions']:
            print(f"   ‚Ä¢ {solution}")

demonstrate_best_practices()

## Conclusion

This notebook demonstrated the power of combining Pydantic validation with local Ollama inference for generating reliable, structured responses from IBM Granite models. 

### Key Takeaways:

1. **Reliability**: Pydantic validation significantly improves response parsing success rates and data quality
2. **Type Safety**: Strong typing prevents runtime errors and improves code maintainability  
3. **Error Handling**: Multiple parsing strategies and retry mechanisms ensure robust operation
4. **Local Control**: Ollama provides privacy and eliminates API dependencies
5. **Flexibility**: Complex nested schemas handle sophisticated data extraction tasks

### When to Use This Approach:

- ‚úÖ **Data extraction from unstructured text**
- ‚úÖ **API response standardization** 
- ‚úÖ **Database population with validation**
- ‚úÖ **Multi-step workflows requiring consistent formats**
- ‚úÖ **Production systems needing reliable outputs**

### Performance vs. Alternatives:

- **vs. Raw JSON**: Higher reliability, better error handling, type safety
- **vs. LMStudio**: More control, better integration, no external dependencies
- **vs. API Services**: Privacy, cost control, no rate limits

### Next Steps:

1. **Scale Up**: Implement batch processing for large datasets
2. **Monitoring**: Add logging and metrics collection
3. **Custom Models**: Fine-tune models for specific domains
4. **Integration**: Build into production pipelines

The combination of Pydantic's robust validation with Granite's language understanding creates a powerful foundation for reliable structured data extraction in production systems.

## References and Further Reading

1. [Pydantic Documentation](https://docs.pydantic.dev/) - Comprehensive guide to Pydantic features
2. [Ollama Documentation](https://ollama.ai/docs) - Setup and usage guides for local LLM inference
3. [IBM Granite Models](https://www.ibm.com/granite) - Information about Granite model capabilities
4. [Structured Response Comparison Notebook](./Structured_Responses_LMStudio.ipynb) - Alternative approach using LMStudio
5. [Entity Extraction Recipe](../Entity-Extraction/entity_extraction.ipynb) - Related structured data extraction examples