In [None]:
# Structured Output Cookbook - Usage Examples

This notebook demonstrates how to use the Structured Output Cookbook for extracting structured data from text using LLMs.

## Features
- 🎯 **Predefined Templates**: Ready-to-use schemas for common tasks
- 🛠️ **Custom YAML Schemas**: Create your own extraction schemas  
- 🔍 **Schema Validation**: Automatic validation of schema structure
- 📊 **Rich Output**: Detailed extraction results with token usage

## Prerequisites
Make sure you have your OpenAI API key set as an environment variable:
```bash
export OPENAI_API_KEY="your-api-key-here"
```


In [1]:
# Import required modules
import os
import json
from structured_output_cookbook import (
    StructuredExtractor, 
    SchemaLoader, 
    JobDescriptionSchema, 
    RecipeSchema,
    Config
)

# Initialize the system
print("🚀 Initializing Structured Output Cookbook...")

# Load configuration
config = Config.from_env()
extractor = StructuredExtractor(config)

print("✅ System initialized successfully!")


🚀 Initializing Structured Output Cookbook...
✅ System initialized successfully!


In [2]:
# 🔧 Optional: Setup minimal logging for cleaner notebook output
from structured_output_cookbook.utils import setup_minimal_logger

# This reduces log verbosity - only shows warnings and errors
# Comment out this line if you want to see detailed logs
setup_minimal_logger(level="ERROR")  # Options: DEBUG, INFO, WARNING, ERROR, CRITICAL

print("📝 Minimal logging configured - only errors will be shown")


📝 Minimal logging configured - only errors will be shown


In [None]:
## 📝 Example 1: Using Predefined Templates

The library comes with built-in templates for common extraction tasks like recipes and job descriptions.


In [3]:
# Example 1: Recipe Extraction
recipe_text = """
Spaghetti Carbonara - The Classic Italian Recipe

Ingredients:
- 400g spaghetti pasta
- 200g guanciale or pancetta, diced
- 4 large eggs
- 100g Pecorino Romano cheese, grated
- Freshly ground black pepper
- Salt for pasta water

Instructions:
1. Bring a large pot of salted water to boil and cook spaghetti until al dente (about 8-10 minutes)
2. While pasta cooks, fry guanciale in a large pan until crispy (about 5 minutes)
3. In a bowl, whisk together eggs, grated cheese, and plenty of black pepper
4. Drain pasta, reserving 1 cup of pasta water
5. Add hot pasta to the pan with guanciale
6. Remove from heat and quickly stir in egg mixture, adding pasta water as needed
7. Serve immediately with extra cheese and pepper

Serves 4 people. Preparation time: 10 minutes. Cooking time: 15 minutes.
Difficulty: Medium. Cuisine: Italian.
"""

print("🍝 Extracting recipe information...")
result = extractor.extract(recipe_text, RecipeSchema)

if result.success and result.data:
    print("✅ Recipe extraction successful!\n")
    
    # Display key information
    data = result.data
    print(f"📖 Recipe: {data.get('name', 'N/A')}")
    print(f"🍽️ Serves: {data.get('servings', 'N/A')} people")
    print(f"⏱️ Total time: {data.get('total_time', 'N/A')}")
    print(f"🏷️ Difficulty: {data.get('difficulty', 'N/A')}")
    print(f"🌍 Cuisine: {data.get('cuisine', 'N/A')}")
    
    ingredients = data.get('ingredients', [])
    print(f"\n🛒 Ingredients ({len(ingredients)} items):")
    for i, ingredient in enumerate(ingredients[:5], 1):  # Show first 5
        name = ingredient.get('name', 'Unknown')
        quantity = ingredient.get('quantity', '')
        unit = ingredient.get('unit', '')
        print(f"  {i}. {quantity} {unit} {name}".strip())
    if len(ingredients) > 5:
        print(f"  ... and {len(ingredients) - 5} more")
    
    tokens = result.tokens_used or 0
    print(f"\n📊 Tokens used: {tokens}")
    print(f"💰 Estimated cost: ~${(tokens * 0.00001):.4f}")
else:
    print(f"❌ Recipe extraction failed: {result.error}")


🍝 Extracting recipe information...
✅ Recipe extraction successful!

📖 Recipe: Spaghetti Carbonara
🍽️ Serves: 4 people
⏱️ Total time: None
🏷️ Difficulty: Medium
🌍 Cuisine: Italian

🛒 Ingredients (6 items):
1. 400 g spaghetti pasta
2. 200 g guanciale or pancetta
3. 4 None large eggs
4. 100 g Pecorino Romano cheese
5. None None Freshly ground black pepper
  ... and 1 more

📊 Tokens used: 1094
💰 Estimated cost: ~$0.0109


In [4]:
# Example 2: Job Description Extraction
job_text = """
Senior Python Developer - Remote Position

TechCorp is seeking a Senior Python Developer to join our growing engineering team. 
This is a full-time remote position with competitive salary and excellent benefits.

Location: Remote (US timezone preferred)
Salary: $120,000 - $150,000 per year
Experience Level: Senior (5+ years)

Required Skills:
- Python programming (5+ years experience)
- Django or Flask frameworks
- PostgreSQL or MySQL databases
- RESTful API development
- Git version control
- Docker containerization

Preferred Skills:
- AWS cloud services
- React.js frontend development
- CI/CD pipelines
- Microservices architecture

Responsibilities:
- Design and develop scalable web applications
- Collaborate with cross-functional teams
- Code review and mentoring junior developers
- Participate in architecture decisions
- Write comprehensive unit tests

Benefits:
- Health, dental, and vision insurance
- 401(k) with company matching
- Flexible vacation policy
- Professional development budget
- Home office stipend

Remote work available. Apply now to join our innovative team!
"""

print("💼 Extracting job description information...")
result = extractor.extract(job_text, JobDescriptionSchema)

if result.success and result.data:
    print("✅ Job extraction successful!\n")
    
    data = result.data
    print(f"🎯 Position: {data.get('title', 'N/A')}")
    print(f"🏢 Company: {data.get('company', 'N/A')}")
    print(f"📍 Location: {data.get('location', 'N/A')}")
    print(f"💼 Type: {data.get('employment_type', 'N/A')}")
    print(f"📈 Level: {data.get('experience_level', 'N/A')}")
    print(f"💰 Salary: {data.get('salary_range', 'N/A')}")
    print(f"🏠 Remote: {'Yes' if data.get('remote_work') else 'No'}")
    
    required_skills = data.get('required_skills', [])
    print(f"\n✅ Required Skills ({len(required_skills)}):")
    for skill in required_skills[:5]:
        print(f"  • {skill}")
    if len(required_skills) > 5:
        print(f"  ... and {len(required_skills) - 5} more")
    
    print(f"\n📊 Tokens used: {result.tokens_used}")
else:
    print(f"❌ Job extraction failed: {result.error}")


💼 Extracting job description information...
✅ Job extraction successful!

🎯 Position: Senior Python Developer
🏢 Company: TechCorp
📍 Location: Remote (US timezone preferred)
💼 Type: Full-time
📈 Level: Senior (5+ years)
💰 Salary: $120,000 - $150,000 per year
🏠 Remote: Yes

✅ Required Skills (6):
  • Python programming (5+ years experience)
  • Django or Flask frameworks
  • PostgreSQL or MySQL databases
  • RESTful API development
  • Git version control
  ... and 1 more

📊 Tokens used: 851


In [None]:
## 🎯 Example 2: Using Custom YAML Schemas

Custom schemas allow you to create specialized extraction templates for any domain. Let's explore the available schemas and use them.


In [4]:
# Load custom schema loader
loader = SchemaLoader("../config/schemas")

# List available custom schemas
available_schemas = loader.get_available_schemas()
print("🗂️ Available custom schemas:")
for schema_name in available_schemas:
    try:
        schema = loader.load_schema(schema_name)
        print(f"  📋 {schema_name}: {schema.description}")
    except Exception as e:
        print(f"  ❌ {schema_name}: Error loading - {e}")

print(f"\n✅ Found {len(available_schemas)} custom schemas")


[32m2025-06-29 07:21:26.322[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: news_article[0m
[32m2025-06-29 07:21:26.327[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: customer_support[0m
[32m2025-06-29 07:21:26.331[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: product_review[0m


🗂️ Available custom schemas:
  📋 news_article: Extract structured information from news articles
  📋 customer_support: Extract structured information from customer support tickets or emails
  📋 product_review: Extract structured information from product reviews

✅ Found 3 custom schemas


In [5]:
# Example: News Article Extraction
news_text = """
Apple Unveils Revolutionary iPhone 15 Pro with Titanium Design

CUPERTINO, Calif., September 12, 2023 - Apple Inc. today announced the highly anticipated 
iPhone 15 Pro, featuring a breakthrough titanium design and advanced artificial intelligence 
capabilities that set a new standard for smartphone technology.

CEO Tim Cook unveiled the device during Apple's annual keynote event at Apple Park, 
highlighting the phone's industry-leading camera system, enhanced battery life, and 
revolutionary Action Button that replaces the traditional mute switch.

"The iPhone 15 Pro represents the biggest leap forward in iPhone technology since its inception," 
said Cook during the presentation. "With its titanium aerospace-grade construction and 
cutting-edge A17 Pro chip, we're delivering unprecedented performance in a device that's 
lighter than ever before."

The new smartphone, starting at $999, features a 6.1-inch display and will be available 
for pre-order this Friday, with general availability beginning September 22. Early reviews 
from technology analysts suggest the device could drive significant sales growth for Apple 
in the competitive smartphone market.

Industry experts predict the iPhone 15 Pro could capture substantial market share from 
competitors, particularly in the premium smartphone segment where Apple has historically 
dominated.
"""

print("📰 Extracting news article information...")

try:
    # Load the news article schema
    news_schema = loader.load_schema("news_article")
    
    # Perform extraction
    result = extractor.extract_with_yaml_schema(news_text, news_schema)
    
    if result.success and result.data:
        print("✅ News extraction successful!\n")
        
        data = result.data
        print(f"📰 Headline: {data.get('headline', 'N/A')}")
        print(f"📝 Summary: {data.get('summary', 'N/A')[:100]}...")
        print(f"📅 Date: {data.get('publication_date', 'N/A')}")
        print(f"✍️ Author: {data.get('author', 'N/A')}")
        print(f"📍 Location: {data.get('location', 'N/A')}")
        
        key_people = data.get('key_people', [])
        if key_people:
            print(f"\n👥 Key People: {', '.join(key_people)}")
        
        organizations = data.get('organizations', [])
        if organizations:
            print(f"🏢 Organizations: {', '.join(organizations)}")
        
        print(f"🏷️ Category: {data.get('category', 'N/A')}")
        print(f"😊 Sentiment: {data.get('sentiment', 'N/A')}")
        
        tokens = result.tokens_used or 0
        print(f"\n📊 Tokens used: {tokens}")
        print(f"💰 Estimated cost: ~${(tokens * 0.00001):.4f}")
    else:
        print(f"❌ News extraction failed: {result.error}")
        
except Exception as e:
    print(f"❌ Error: {e}")


[32m2025-06-29 07:21:31.276[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: news_article[0m
[32m2025-06-29 07:21:31.278[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m111[0m - [1mStarting extraction with YAML schema: News Article[0m


📰 Extracting news article information...


[32m2025-06-29 07:21:33.352[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m140[0m - [1mYAML schema extraction completed successfully[0m


✅ News extraction successful!

📰 Headline: Apple Unveils Revolutionary iPhone 15 Pro with Titanium Design
📝 Summary: Apple has announced the iPhone 15 Pro, featuring a titanium design and advanced AI capabilities. CEO...
📅 Date: September 12, 2023
✍️ Author: None
📍 Location: Cupertino, Calif.

👥 Key People: Tim Cook
🏢 Organizations: Apple Inc.
🏷️ Category: technology
😊 Sentiment: positive

📊 Tokens used: 677
💰 Estimated cost: ~$0.0068


In [6]:
# Example: Product Review Extraction
review_text = """
Amazing Wireless Earbuds - 5 Stars!

By Sarah Johnson - Verified Purchase - October 15, 2023

I've been using the Sony WF-1000XM4 earbuds for about 3 months now and I'm absolutely 
blown away by the quality! 

PROS:
✅ Incredible noise cancellation - I can't hear anything when I'm on the subway
✅ Battery life is fantastic - easily gets me through 8+ hours with the case
✅ Sound quality is crystal clear with amazing bass
✅ Comfortable fit even for long listening sessions
✅ Quick charge feature is a lifesaver

CONS:
❌ A bit pricey at $280, but honestly worth every penny
❌ The touch controls can be a bit sensitive sometimes
❌ Case is slightly bulky compared to AirPods

Overall, these are hands down the best wireless earbuds I've ever owned. The noise 
cancellation alone makes them worth the price. I've recommended them to all my friends 
and family. Would definitely buy again!

Rating: 5/5 stars ⭐⭐⭐⭐⭐
Would I recommend? Absolutely YES!
"""

print("⭐ Extracting product review information...")

try:
    # Load the product review schema
    review_schema = loader.load_schema("product_review")
    
    # Perform extraction
    result = extractor.extract_with_yaml_schema(review_text, review_schema)
    
    if result.success and result.data:
        print("✅ Review extraction successful!\n")
        
        data = result.data
        print(f"📱 Product: {data.get('product_name', 'N/A')}")
        print(f"🏷️ Brand: {data.get('brand', 'N/A')}")
        print(f"⭐ Rating: {data.get('rating', 'N/A')}")
        print(f"👤 Reviewer: {data.get('reviewer_name', 'N/A')}")
        print(f"✅ Verified: {'Yes' if data.get('verified_purchase') else 'No'}")
        
        pros = data.get('pros', [])
        if pros:
            print(f"\n✅ Pros ({len(pros)}):")
            for pro in pros:
                print(f"  • {pro}")
        
        cons = data.get('cons', [])
        if cons:
            print(f"\n❌ Cons ({len(cons)}):")
            for con in cons:
                print(f"  • {con}")
        
        print(f"\n😊 Sentiment: {data.get('overall_sentiment', 'N/A')}")
        print(f"👍 Would Recommend: {'Yes' if data.get('would_recommend') else 'No'}")
        
        price = data.get('price_mentioned')
        if price:
            print(f"💰 Price Mentioned: {price}")
        
        print(f"\n📊 Tokens used: {result.tokens_used}")
    else:
        print(f"❌ Review extraction failed: {result.error}")
        
except Exception as e:
    print(f"❌ Error: {e}")


[32m2025-06-29 07:21:33.372[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: product_review[0m
[32m2025-06-29 07:21:33.374[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m111[0m - [1mStarting extraction with YAML schema: Product Review[0m


⭐ Extracting product review information...


[32m2025-06-29 07:21:36.318[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m140[0m - [1mYAML schema extraction completed successfully[0m


✅ Review extraction successful!

📱 Product: WF-1000XM4
🏷️ Brand: Sony
⭐ Rating: 5
👤 Reviewer: Sarah Johnson
✅ Verified: Yes

✅ Pros (5):
  • Incredible noise cancellation
  • Battery life is fantastic
  • Sound quality is crystal clear with amazing bass
  • Comfortable fit
  • Quick charge feature

❌ Cons (3):
  • A bit pricey at $280
  • Touch controls can be a bit sensitive
  • Case is slightly bulky compared to AirPods

😊 Sentiment: positive
👍 Would Recommend: Yes
💰 Price Mentioned: $280

📊 Tokens used: 664


In [None]:
## 🔍 Example 3: Schema Validation and Inspection

Let's explore how to validate schemas and inspect their structure programmatically.


In [7]:
# Schema validation and inspection
print("🔍 Validating all available schemas...\n")

for schema_name in available_schemas:
    print(f"📋 Schema: {schema_name}")
    
    # Validate schema structure
    is_valid, error = loader.validate_schema_structure(schema_name)
    status = "✅ Valid" if is_valid else f"❌ Invalid: {error}"
    print(f"   Status: {status}")
    
    if is_valid:
        try:
            # Load and inspect schema
            schema = loader.load_schema(schema_name)
            
            print(f"   Name: {schema.name}")
            print(f"   Description: {schema.description}")
            
            # Count properties
            properties = schema.schema.get('properties', {})
            required = schema.schema.get('required', [])
            
            print(f"   Properties: {len(properties)} total, {len(required)} required")
            
            # Show some properties
            if properties:
                print("   Key fields:")
                for prop_name, prop_def in list(properties.items())[:3]:
                    prop_type = prop_def.get('type', 'unknown')
                    is_required = prop_name in required
                    req_mark = " (required)" if is_required else ""
                    print(f"     • {prop_name}: {prop_type}{req_mark}")
                
                if len(properties) > 3:
                    print(f"     ... and {len(properties) - 3} more")
            
        except Exception as e:
            print(f"   ❌ Error loading schema: {e}")
    
    print()  # Empty line for readability


[32m2025-06-29 07:21:36.336[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: news_article[0m
[32m2025-06-29 07:21:36.342[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: news_article[0m
[32m2025-06-29 07:21:36.347[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: customer_support[0m
[32m2025-06-29 07:21:36.352[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: customer_support[0m
[32m2025-06-29 07:21:36.357[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: product_review[0m
[32m2025-06-29 07:21:36.362[0m | [1mINFO    [0m | [36mstructured_output_coo

🔍 Validating all available schemas...

📋 Schema: news_article
   Status: ✅ Valid
   Name: News Article
   Description: Extract structured information from news articles
   Properties: 9 total, 9 required
   Key fields:
     • headline: string (required)
     • summary: string (required)
     • publication_date: ['string', 'null'] (required)
     ... and 6 more

📋 Schema: customer_support
   Status: ✅ Valid
   Name: Customer Support Ticket
   Description: Extract structured information from customer support tickets or emails
   Properties: 12 total, 12 required
   Key fields:
     • ticket_id: ['string', 'null'] (required)
     • customer_name: ['string', 'null'] (required)
     • customer_email: ['string', 'null'] (required)
     ... and 9 more

📋 Schema: product_review
   Status: ✅ Valid
   Name: Product Review
   Description: Extract structured information from product reviews
   Properties: 11 total, 11 required
   Key fields:
     • product_name: string (required)
     • brand: ['s

In [None]:
## 📊 Example 4: Batch Processing and Analysis

Let's process multiple texts and analyze the results.


In [8]:
# Batch processing example
sample_reviews = [
    "Great product! Love the design and functionality. 5 stars! - John D.",
    "Okay product but overpriced. Works as expected but nothing special. 3/5 - Mary S.", 
    "Terrible quality. Broke after 2 weeks. Don't waste your money! 1 star - Bob K.",
    "Excellent value for money. Highly recommend this product! ⭐⭐⭐⭐⭐ - Lisa M."
]

print("📊 Processing multiple product reviews...\n")

# Load review schema
review_schema = loader.load_schema("product_review")

results = []
total_tokens = 0

for i, review_text in enumerate(sample_reviews, 1):
    print(f"Processing review {i}/{len(sample_reviews)}...")
    
    try:
        result = extractor.extract_with_yaml_schema(review_text, review_schema)
        
        if result.success and result.data:
            results.append(result.data)
            total_tokens += result.tokens_used or 0
            
            rating = result.data.get('rating', 'N/A')
            sentiment = result.data.get('overall_sentiment', 'N/A')
            recommend = result.data.get('would_recommend', False)
            
            print(f"  ✅ Rating: {rating}, Sentiment: {sentiment}, Recommend: {recommend}")
        else:
            print(f"  ❌ Failed: {result.error}")
            
    except Exception as e:
        print(f"  ❌ Error: {e}")

print(f"\n📈 Batch Processing Summary:")
print(f"  📝 Reviews processed: {len(sample_reviews)}")
print(f"  ✅ Successful extractions: {len(results)}")
print(f"  📊 Total tokens used: {total_tokens}")
print(f"  💰 Total estimated cost: ~${(total_tokens * 0.00001):.4f}")

# Analyze results
if results:
    print(f"\n🔍 Analysis of Results:")
    
    # Count sentiments
    sentiments = [r.get('overall_sentiment') for r in results if r.get('overall_sentiment')]
    if sentiments:
        from collections import Counter
        sentiment_counts = Counter(sentiments)
        print(f"  😊 Sentiment distribution:")
        for sentiment, count in sentiment_counts.items():
            print(f"    • {sentiment}: {count}")
    
    # Average rating (if available)
    ratings = [r.get('rating') for r in results if r.get('rating') and isinstance(r.get('rating'), (int, float))]
    if ratings:
        avg_rating = sum(ratings) / len(ratings)
        print(f"  ⭐ Average rating: {avg_rating:.1f}/5")
    
    # Recommendation rate
    recommendations = [r.get('would_recommend') for r in results if r.get('would_recommend') is not None]
    if recommendations:
        recommend_rate = sum(recommendations) / len(recommendations) * 100
        print(f"  👍 Recommendation rate: {recommend_rate:.1f}%")


[32m2025-06-29 07:21:36.445[0m | [1mINFO    [0m | [36mstructured_output_cookbook.utils.schema_loader[0m:[36mload_schema[0m:[36m102[0m - [1mLoaded schema: product_review[0m
[32m2025-06-29 07:21:36.447[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m111[0m - [1mStarting extraction with YAML schema: Product Review[0m


📊 Processing multiple product reviews...

Processing review 1/4...


[32m2025-06-29 07:21:37.869[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m140[0m - [1mYAML schema extraction completed successfully[0m
[32m2025-06-29 07:21:37.870[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m111[0m - [1mStarting extraction with YAML schema: Product Review[0m


  ✅ Rating: 5, Sentiment: positive, Recommend: True
Processing review 2/4...


[32m2025-06-29 07:21:39.310[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m140[0m - [1mYAML schema extraction completed successfully[0m
[32m2025-06-29 07:21:39.312[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m111[0m - [1mStarting extraction with YAML schema: Product Review[0m


  ✅ Rating: 3, Sentiment: mixed, Recommend: None
Processing review 3/4...


[32m2025-06-29 07:21:40.403[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m140[0m - [1mYAML schema extraction completed successfully[0m
[32m2025-06-29 07:21:40.405[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m111[0m - [1mStarting extraction with YAML schema: Product Review[0m


  ✅ Rating: 1, Sentiment: negative, Recommend: False
Processing review 4/4...


[32m2025-06-29 07:21:41.541[0m | [1mINFO    [0m | [36mstructured_output_cookbook.extractor[0m:[36mextract_with_yaml_schema[0m:[36m140[0m - [1mYAML schema extraction completed successfully[0m


  ✅ Rating: 5, Sentiment: positive, Recommend: True

📈 Batch Processing Summary:
  📝 Reviews processed: 4
  ✅ Successful extractions: 4
  📊 Total tokens used: 1633
  💰 Total estimated cost: ~$0.0163

🔍 Analysis of Results:
  😊 Sentiment distribution:
    • positive: 2
    • mixed: 1
    • negative: 1
  ⭐ Average rating: 3.5/5
  👍 Recommendation rate: 66.7%


In [None]:
## 🛠️ Example 5: Creating a Custom Schema

Here's how you can create your own custom YAML schema for specific extraction needs.


In [9]:
# Example: Custom schema structure (for reference)
custom_schema_example = """
# Meeting Minutes Extraction Schema
name: "Meeting Minutes"
description: "Extract structured information from meeting minutes and notes"

# System prompt for extraction
system_prompt: |
  Extract structured information from the following meeting minutes.
  Focus on identifying:
  - Meeting details (date, attendees, duration)
  - Key decisions and action items
  - Discussion topics and outcomes
  
  If information is not available, leave fields as null or empty arrays.

# JSON Schema definition
schema:
  type: object
  properties:
    meeting_title:
      type: string
      description: "Title or subject of the meeting"
    date:
      type: ["string", "null"]
      description: "Meeting date"
    attendees:
      type: array
      items:
        type: string
      description: "List of meeting attendees"
    duration:
      type: ["string", "null"]
      description: "Meeting duration"
    decisions:
      type: array
      items:
        type: string
      description: "Key decisions made during the meeting"
    action_items:
      type: array
      items:
        type: object
        properties:
          task:
            type: string
            description: "Description of the action item"
          assignee:
            type: ["string", "null"]
            description: "Person assigned to the task"
          deadline:
            type: ["string", "null"]
            description: "Deadline for the task"
        required: ["task"]
      description: "Action items from the meeting"
    topics_discussed:
      type: array
      items:
        type: string
      description: "Main topics discussed in the meeting"
  required: ["meeting_title", "attendees", "decisions", "action_items", "topics_discussed"]
  additionalProperties: false
"""

print("🛠️ Custom Schema Example:")
print("To create a custom schema, save the above YAML structure to:")
print("📁 config/schemas/meeting_minutes.yaml")
print()
print("📋 Schema Structure Overview:")
print("  • name: Human-readable schema name")
print("  • description: Brief description of what it extracts") 
print("  • system_prompt: Instructions for the LLM")
print("  • schema: JSON Schema definition with properties and validation")
print()
print("🎯 Key Requirements:")
print("  • Must have type: 'object' with properties")
print("  • Include required fields array")
print("  • Set additionalProperties: false for strict validation")
print("  • Use descriptive field descriptions")


🛠️ Custom Schema Example:
To create a custom schema, save the above YAML structure to:
📁 config/schemas/meeting_minutes.yaml

📋 Schema Structure Overview:
  • name: Human-readable schema name
  • description: Brief description of what it extracts
  • system_prompt: Instructions for the LLM
  • schema: JSON Schema definition with properties and validation

🎯 Key Requirements:
  • Must have type: 'object' with properties
  • Include required fields array
  • Set additionalProperties: false for strict validation
  • Use descriptive field descriptions
