<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 10px; margin-bottom: 20px;">
    <h1 style="color: white; margin: 0; font-size: 36px;">🔍 Notebook 4: Debugging, Logging & Persistence</h1>
    <p style="color: rgba(255,255,255,0.9); margin-top: 10px; font-size: 18px;">Understanding, Monitoring, and Saving Your LLM Applications</p>
</div>

<div style="display: flex; justify-content: space-between; margin-bottom: 20px;">
    <a href="03_modules_adapters.ipynb" style="text-decoration: none; padding: 10px 20px; background: #f0f0f0; border-radius: 5px;">← Notebook 3</a>
    <span style="padding: 10px 20px; background: #fff8e1; border-radius: 5px;">🟡 Intermediate • 25 minutes</span>
    <a href="05_optimization.ipynb" style="text-decoration: none; padding: 10px 20px; background: #f0f0f0; border-radius: 5px;">Notebook 5 →</a>
</div>

## 🎯 What You'll Learn

<div style="background: #f5f5f5; padding: 20px; border-radius: 10px; border-left: 4px solid #667eea;">
    <h3>🔍 Debugging - See Inside Your LLM</h3>
    <ul style="margin: 10px 0; padding-left: 20px;">
        <li>✅ <strong>Debug Mode</strong>: Inspect every request and response</li>
        <li>✅ <strong>Token Tracking</strong>: Monitor usage and costs</li>
        <li>✅ <strong>Prompt Inspection</strong>: See exactly what's sent to the LLM</li>
        <li>✅ <strong>Error Analysis</strong>: Understand and fix failures</li>
    </ul>    
    <h3>📊 Logging & Callbacks - Production Monitoring</h3>
    <ul style="margin: 10px 0; padding-left: 20px;">
        <li>✅ <strong>JSONL Logging</strong>: Structured logs for analysis</li>
        <li>✅ <strong>Custom Callbacks</strong>: Monitor your specific needs</li>
        <li>✅ <strong>Performance Tracking</strong>: Latency and throughput metrics</li>
        <li>✅ <strong>Cost Management</strong>: Track API spending</li>
    </ul>    
    <h3>💾 Persistence - Save & Load Your Work</h3>
    <ul style="margin: 10px 0; padding-left: 20px;">
        <li>✅ <strong>Save Modules</strong>: Persist configured modules</li>
        <li>✅ <strong>Load & Resume</strong>: Continue from saved state</li>
        <li>✅ <strong>Optimization Results</strong>: Store and reuse optimized modules</li>
        <li>✅ <strong>Production Deployment</strong>: Package for deployment</li>
    </ul>
</div>

## 🔧 Setup

In [1]:
import asyncio
import json
import os
from pathlib import Path
from typing import Dict, Any
import tempfile
import time

import logillm
from logillm.core.predict import Predict, ChainOfThought
from logillm.core.signatures import Signature, InputField, OutputField
from logillm.core.callbacks import CallbackManager, AbstractCallback, ModuleEndEvent
from logillm.core.jsonl_callback import JSONLCallback, register_jsonl_logger
from logillm.providers import create_provider, register_provider

# Check API key
if not os.getenv("OPENAI_API_KEY"):
    print("⚠️ WARNING: OPENAI_API_KEY not set!")
    print("Set it with: export OPENAI_API_KEY=your_key")
else:
    print("✅ OpenAI API key found")

# Setup provider
try:
    provider = create_provider("openai", model="gpt-4.1-mini")
    register_provider(provider, set_default=True)
    print(f"✅ LogiLLM {logillm.__version__} ready with {provider.model}!")
except Exception as e:
    print(f"❌ Error: {e}")

# Create temp directory for examples
TEMP_DIR = Path(tempfile.mkdtemp(prefix="logillm_notebook_"))
print(f"📁 Working directory: {TEMP_DIR}")

✅ OpenAI API key found
✅ LogiLLM 0.2.16 ready with gpt-4.1-mini!
📁 Working directory: /tmp/logillm_notebook_c36fth53


## 🔍 Part 1: Debugging - See Inside Your LLM

Understanding what's happening inside your LLM calls is crucial for development and troubleshooting.

In [2]:
# Define a simple signature for testing
class SimpleQA(Signature):
    """Answer questions concisely."""
    question: str = InputField(desc="The question to answer")
    answer: str = OutputField(desc="A concise answer")

# Create module WITH debug mode enabled
qa_debug = Predict(SimpleQA, debug=True)

print("🔍 Debug Mode Example:")
print("=" * 60)

# Make a call - debug info will be captured
result = await qa_debug(question="What is the capital of France?")

# Now we have access to complete request/response data!
print(f"\n✅ Answer: {result.outputs['answer']}")
print(f"\n📊 Debug Information Available:")
print(f"  • Has request data: {hasattr(result, 'request')}")
print(f"  • Has response data: {hasattr(result, 'response')}")
print(f"  • Has prompt data: {hasattr(result, 'prompt')}")

🔍 Debug Mode Example:

✅ Answer: The capital of France is Paris.

📊 Debug Information Available:
  • Has request data: True
  • Has response data: True
  • Has prompt data: True


In [3]:
# Inspect the request data (what was sent to the LLM)
print("📤 REQUEST DATA:")
print("=" * 60)

if hasattr(result, 'request') and result.request:
    print(f"Provider: {result.request.get('provider', 'N/A')}")
    print(f"Model: {result.request.get('model', 'N/A')}")
    print(f"Adapter: {result.request.get('adapter', 'N/A')}")
    
    # Show the actual messages sent
    messages = result.request.get('messages', [])
    print(f"\nMessages sent ({len(messages)} total):")
    for i, msg in enumerate(messages[:2]):  # Show first 2 messages
        print(f"  [{i}] Role: {msg.get('role', 'unknown')}")
        content = msg.get('content', '')[:100]  # First 100 chars
        print(f"      Content: {content}...")

print("\n📥 RESPONSE DATA:")
print("=" * 60)

if hasattr(result, 'response') and result.response:
    text_content = result.response.get('text', '')
    print(f"Text length: {len(text_content)} chars")
    print(f"Finish reason: {result.response.get('finish_reason', 'N/A')}")
    
    # Handle latency that might be None
    latency = result.response.get('latency')
    if latency is not None:
        print(f"Latency: {latency:.2f} seconds")
    else:
        print("Latency: N/A")
    
    # Token usage and costs
    usage = result.response.get('usage', {})
    print(f"\n💰 Token Usage:")
    print(f"  Input tokens: {usage.get('input_tokens', 0)}")
    print(f"  Output tokens: {usage.get('output_tokens', 0)}")
    print(f"  Total tokens: {usage.get('total_tokens', 0)}")
    
    cost = result.response.get('cost')
    if cost is not None and cost > 0:
        print(f"  Estimated cost: ${cost:.5f}")
    else:
        print("  Estimated cost: N/A")

📤 REQUEST DATA:
Provider: openai
Model: gpt-4.1-mini
Adapter: chat

Messages sent (2 total):
  [0] Role: system
      Content: Answer questions concisely....
  [1] Role: user
      Content: [system]: Task: Answer questions concisely.

Input fields:
- question: The question

Output fields (...

📥 RESPONSE DATA:
Text length: 39 chars
Finish reason: stop
Latency: N/A

💰 Token Usage:
  Input tokens: 84
  Output tokens: 9
  Total tokens: 93
  Estimated cost: N/A


### 🎯 Three Ways to Enable Debug Mode

In [4]:
# Method 1: Enable at module creation
qa1 = Predict(SimpleQA, debug=True)
print("Method 1: Debug enabled at creation")

# Method 2: Toggle debug dynamically
qa2 = Predict(SimpleQA)  # Created without debug
qa2.enable_debug_mode()  # Turn it on
print("Method 2: Debug enabled dynamically")

# Method 3: Environment variable (affects all modules)
# os.environ['LOGILLM_DEBUG'] = '1'
# qa3 = Predict(SimpleQA)  # Will have debug enabled
print("Method 3: Set LOGILLM_DEBUG=1 environment variable")

# You can also disable debug mode
qa1.disable_debug_mode()
print("\n✅ Debug can be toggled on/off as needed!")

Method 1: Debug enabled at creation
Method 2: Debug enabled dynamically
Method 3: Set LOGILLM_DEBUG=1 environment variable

✅ Debug can be toggled on/off as needed!


### 🐛 Practical Example: Debugging a Complex Task

In [5]:
# A more complex signature that might have issues
class DataExtraction(Signature):
    """Extract structured data from text."""
    text: str = InputField(desc="Text to analyze")
    
    entities: list[str] = OutputField(desc="Named entities found")
    numbers: list[float] = OutputField(desc="Numbers mentioned")
    dates: list[str] = OutputField(desc="Dates mentioned")

# Enable debug to understand what's happening
extractor = ChainOfThought(DataExtraction, debug=True)

test_text = """
Apple reported $94.8 billion in revenue for Q4 2023, beating estimates.
CEO Tim Cook announced the results on November 2nd, highlighting strong iPhone sales.
"""

print("🔍 Debugging Complex Extraction:")
print("=" * 60)

result = await extractor(text=test_text)

if result.success:
    print("\n✅ Extraction Results:")
    print(f"  Entities: {result.outputs['entities']}")
    print(f"  Numbers: {result.outputs['numbers']}")
    print(f"  Dates: {result.outputs['dates']}")
    
    # Debug helps us understand the reasoning
    if 'reasoning' in result.outputs:
        print(f"\n💭 LLM Reasoning:")
        print(f"  {result.outputs['reasoning'][:200]}...")
else:
    print("\n❌ Extraction failed!")
    # Debug mode lets us see why
    if hasattr(result, 'response'):
        print(f"Error details: {result.response}")

🔍 Debugging Complex Extraction:

✅ Extraction Results:
  Entities: ['Apple', 'Tim Cook']
  Numbers: [94.8, 4]
  Dates: ['Q4 2023', 'November 2nd']

💭 LLM Reasoning:
  Step 1: Identify named entities such as organizations and people. Here, 'Apple' and 'Tim Cook' are named entities. Step 2: Extract numbers mentioned, including financial figures like '$94.8 billion'. ...


## 📊 Part 2: Logging & Callbacks - Production Monitoring

For production systems, you need structured logging to track performance, costs, and debug issues.

In [6]:
# JSONL logging captures everything in structured format
log_file = TEMP_DIR / "execution.jsonl"

# Register the JSONL logger
callback_id = register_jsonl_logger(
    str(log_file),
    include_module_events=True,    # Log module start/end
    include_provider_events=True,  # Log LLM requests/responses
    include_optimization_events=False  # Not optimizing yet
)

print(f"📝 Logging to: {log_file}")
print(f"📌 Callback ID: {callback_id}")

# Now all module executions will be logged!
qa = Predict(SimpleQA)

# Make several calls - all will be logged
questions = [
    "What is the speed of light?",
    "Who wrote Romeo and Juliet?",
    "What is machine learning?"
]

print("\n🔄 Making API calls...")
for q in questions:
    result = await qa(question=q)
    print(f"  Q: {q[:30]}... → A: {result.outputs['answer'][:50]}...")

print(f"\n✅ All calls logged to {log_file.name}")

📝 Logging to: /tmp/logillm_notebook_c36fth53/execution.jsonl
📌 Callback ID: JSONLCallback_138711150710464

🔄 Making API calls...
  Q: What is the speed of light?... → A: The speed of light in a vacuum is approximately 29...
  Q: Who wrote Romeo and Juliet?... → A: William Shakespeare wrote Romeo and Juliet....
  Q: What is machine learning?... → A: Machine learning is a field of artificial intellig...

✅ All calls logged to execution.jsonl


In [7]:
# Analyze the logged data
print("📊 Analyzing Logged Data:")
print("=" * 60)

events = []
with open(log_file, 'r') as f:
    for line in f:
        events.append(json.loads(line))

print(f"\nTotal events logged: {len(events)}")

# Analyze event types
event_types = {}
for event in events:
    event_type = event.get('event_type', 'unknown')
    event_types[event_type] = event_types.get(event_type, 0) + 1

print("\n📈 Event Type Distribution:")
for event_type, count in event_types.items():
    print(f"  {event_type}: {count}")

# Calculate total cost and tokens
total_cost = 0
total_tokens = 0
total_latency = 0
provider_response_count = 0

for event in events:
    if event.get('event_type') == 'provider_response':
        provider_response_count += 1
        usage = event.get('usage', {})
        total_tokens += usage.get('total_tokens', 0)
        
        cost = event.get('cost')
        if cost is not None:
            total_cost += cost
        
        duration = event.get('duration')
        if duration is not None:
            total_latency += duration

print("\n💰 Usage Summary:")
print(f"  Total tokens: {total_tokens:,}")

if total_cost > 0:
    print(f"  Total cost: ${total_cost:.5f}")
else:
    print("  Total cost: $0.00000")

if total_latency > 0:
    print(f"  Total latency: {total_latency:.2f}s")
    if provider_response_count > 0:
        avg_latency = total_latency / provider_response_count
        print(f"  Average latency: {avg_latency:.2f}s per call")
else:
    print("  Total latency: N/A")

# Show a sample event
print("\n📝 Sample Event Structure:")
if events:
    sample = events[0]
    print(f"  Type: {sample.get('event_type')}")
    print(f"  Timestamp: {sample.get('timestamp')}")
    print(f"  Module: {sample.get('module_name', 'N/A')}")
    
    call_id = sample.get('call_id', 'N/A')
    if call_id and call_id != 'N/A':
        print(f"  Call ID: {str(call_id)[:20]}...")
    else:
        print("  Call ID: N/A")

📊 Analyzing Logged Data:

Total events logged: 12

📈 Event Type Distribution:
  module_start: 3
  provider_request: 3
  provider_response: 3
  module_end: 3

💰 Usage Summary:
  Total tokens: 321
  Total cost: $0.00000
  Total latency: 2.47s
  Average latency: 0.82s per call

📝 Sample Event Structure:
  Type: module_start
  Timestamp: 2025-09-03T08:01:17.355676
  Module: Predict
  Call ID: c289f926-a901-4e12-9...


### 🎨 Custom Callbacks for Specific Monitoring

In [8]:
# Create a custom callback for cost tracking
class CostTracker(AbstractCallback):
    def __init__(self):
        self.total_cost = 0
        self.call_count = 0
        self.costs_by_module = {}
    
    async def on_module_end(self, event: ModuleEndEvent):
        """Track costs when modules complete."""
        self.call_count += 1
        
        # Get module name
        module_name = event.module.__class__.__name__
        
        # Extract cost from prediction if available
        cost = None
        if hasattr(event, 'prediction') and event.prediction:
            if hasattr(event.prediction, 'response') and event.prediction.response:
                cost = event.prediction.response.get('cost')
        
        if cost is not None and cost > 0:
            self.total_cost += cost
            
            # Track by module
            if module_name not in self.costs_by_module:
                self.costs_by_module[module_name] = 0
            self.costs_by_module[module_name] += cost
            
            print(f"💰 {module_name} call #{self.call_count}: ${cost:.5f}")
        else:
            print(f"💰 {module_name} call #{self.call_count}: No cost data")
    
    def summary(self):
        """Print cost summary."""
        print("\n📊 Cost Tracking Summary:")
        print("=" * 40)
        print(f"Total calls: {self.call_count}")
        
        if self.total_cost > 0:
            print(f"Total cost: ${self.total_cost:.5f}")
            if self.call_count > 0:
                print(f"Average cost per call: ${self.total_cost/self.call_count:.5f}")
        else:
            print("Total cost: No cost data available")
        
        if self.costs_by_module:
            print("\nCost by module:")
            for module, cost in self.costs_by_module.items():
                if cost > 0:
                    print(f"  {module}: ${cost:.5f}")

# Register and use the custom callback
cost_tracker = CostTracker()
manager = CallbackManager()
manager.register(cost_tracker)

# Make some calls with cost tracking
qa = Predict(SimpleQA, debug=True)  # Debug mode captures costs

print("🔄 Making tracked calls...")
for q in ["What is AI?", "Explain quantum computing"]:
    result = await qa(question=q)

# Show summary
cost_tracker.summary()

🔄 Making tracked calls...
💰 Predict call #1: No cost data
💰 Predict call #2: No cost data

📊 Cost Tracking Summary:
Total calls: 2
Total cost: No cost data available


## 💾 Part 3: Persistence - Save & Load Your Work

Being able to save configured modules and load them later is essential for production deployments.

In [9]:
# Define a signature with multiple fields
class DocumentAnalysis(Signature):
    """Analyze documents for key information."""
    document: str = InputField(desc="Document to analyze")
    
    summary: str = OutputField(desc="Brief summary")
    key_points: list[str] = OutputField(desc="Main points")
    sentiment: str = OutputField(desc="Overall sentiment")
    category: str = OutputField(desc="Document category")

# Create and configure a module
analyzer = ChainOfThought(DocumentAnalysis, adapter="json")

print("📝 Created DocumentAnalysis module")
print(f"  Module type: {type(analyzer).__name__}")
print(f"  Adapter: {analyzer.adapter.format_type.value}")
print(f"  Has reasoning: {'reasoning' in analyzer.signature.output_fields}")

# Test the module before saving
test_doc = "LogiLLM is a powerful framework for building LLM applications. It provides clean APIs and reliable parsing."
result = await analyzer(document=test_doc)

print(f"\n✅ Module works! Summary: {result.outputs['summary'][:50]}...")

print("\n💾 Now let's save this configured module...")

📝 Created DocumentAnalysis module
  Module type: ChainOfThought
  Adapter: json
  Has reasoning: True
💰 ChainOfThought call #3: No cost data

✅ Module works! Summary: LogiLLM is a robust framework offering clean APIs ...

💾 Now let's save this configured module...


### 📦 Method 1: Pickle - Simple and Reliable

In [10]:
import pickle

# Save the module with pickle
save_path = TEMP_DIR / "analyzer.pkl"

with open(save_path, 'wb') as f:
    pickle.dump(analyzer, f)

print(f"💾 Saved module to: {save_path}")
print(f"📊 File size: {save_path.stat().st_size:,} bytes")

# Load the module back
with open(save_path, 'rb') as f:
    loaded_analyzer = pickle.load(f)

print(f"\n✅ Loaded module successfully!")
print(f"  Type: {type(loaded_analyzer).__name__}")
print(f"  Has signature: {hasattr(loaded_analyzer, 'signature')}")
print(f"  Has adapter: {hasattr(loaded_analyzer, 'adapter')}")

# Test that the loaded module works
test_doc2 = "Python is a versatile programming language loved by developers worldwide."
result2 = await loaded_analyzer(document=test_doc2)

print(f"\n🎯 Loaded module works!")
print(f"  Summary: {result2.outputs['summary'][:50]}...")
print(f"  Sentiment: {result2.outputs['sentiment']}")

💾 Saved module to: /tmp/logillm_notebook_c36fth53/analyzer.pkl
📊 File size: 3,302 bytes

✅ Loaded module successfully!
  Type: ChainOfThought
  Has signature: True
  Has adapter: True
💰 ChainOfThought call #4: No cost data

🎯 Loaded module works!
  Summary: Python is a popular and versatile programming lang...
  Sentiment: positive


### 🗂️ Managing Multiple Modules

In [11]:
# Create a module registry for managing multiple modules
class ModuleRegistry:
    """Simple registry for saving and loading multiple modules."""
    
    def __init__(self, base_path: Path):
        self.base_path = Path(base_path)
        self.base_path.mkdir(exist_ok=True)
        self.registry_file = self.base_path / "registry.json"
        self.registry = self._load_registry()
    
    def _load_registry(self) -> dict:
        """Load or create registry."""
        if self.registry_file.exists():
            with open(self.registry_file) as f:
                return json.load(f)
        return {}
    
    def _save_registry(self):
        """Save registry to disk."""
        with open(self.registry_file, 'w') as f:
            json.dump(self.registry, f, indent=2)
    
    def save_module(self, name: str, module, description: str = ""):
        """Save a module to the registry."""
        module_path = self.base_path / f"{name}.pkl"
        
        # Save module
        with open(module_path, 'wb') as f:
            pickle.dump(module, f)
        
        # Update registry
        self.registry[name] = {
            "path": str(module_path),
            "type": type(module).__name__,
            "description": description,
            "saved_at": str(Path(tempfile.gettempdir()).parent)  # timestamp proxy
        }
        self._save_registry()
        
        print(f"✅ Saved module '{name}'")
        return module_path
    
    def load_module(self, name: str):
        """Load a module from the registry."""
        if name not in self.registry:
            raise ValueError(f"Module '{name}' not found in registry")
        
        module_path = self.registry[name]["path"]
        with open(module_path, 'rb') as f:
            return pickle.load(f)
    
    def list_modules(self):
        """List all saved modules."""
        return self.registry

# Create a registry
registry = ModuleRegistry(TEMP_DIR / "modules")

# Save multiple configured modules
registry.save_module(
    "document_analyzer",
    analyzer,
    "Analyzes documents with ChainOfThought reasoning"
)

registry.save_module(
    "simple_qa",
    qa,
    "Simple question-answering module"
)

# Create and save another module
summarizer = Predict(SimpleQA)  # Reusing SimpleQA for demo
registry.save_module(
    "summarizer",
    summarizer,
    "Text summarization module"
)

# List all saved modules
print("\n📚 Module Registry:")
print("=" * 50)
for name, info in registry.list_modules().items():
    print(f"\n📦 {name}")
    print(f"   Type: {info['type']}")
    print(f"   Description: {info['description']}")

print(f"\n💾 Registry location: {registry.base_path}")

✅ Saved module 'document_analyzer'
✅ Saved module 'simple_qa'
✅ Saved module 'summarizer'

📚 Module Registry:

📦 document_analyzer
   Type: ChainOfThought
   Description: Analyzes documents with ChainOfThought reasoning

📦 simple_qa
   Type: Predict
   Description: Simple question-answering module

📦 summarizer
   Type: Predict
   Description: Text summarization module

💾 Registry location: /tmp/logillm_notebook_c36fth53/modules


In [12]:
# Load modules from registry
print("🔄 Loading modules from registry...\n")

# Load the document analyzer
loaded_doc_analyzer = registry.load_module("document_analyzer")
print(f"✅ Loaded 'document_analyzer'")
print(f"   Type: {type(loaded_doc_analyzer).__name__}")

# Use the loaded module
test_doc3 = """
Artificial Intelligence is transforming industries worldwide.
From healthcare to finance, AI systems are improving efficiency and decision-making.
However, ethical considerations remain paramount.
"""

result3 = await loaded_doc_analyzer(document=test_doc3)

print(f"\n📝 Analysis Results:")
print(f"   Summary: {result3.outputs['summary']}")
print(f"   Sentiment: {result3.outputs['sentiment']}")
print(f"   Key Points: {len(result3.outputs.get('key_points', []))} found")

# Load and use another module
loaded_qa = registry.load_module("simple_qa")
qa_result = await loaded_qa(question="What is the capital of Japan?")
print(f"\n🎯 QA Module Result: {qa_result.outputs['answer']}")

🔄 Loading modules from registry...

✅ Loaded 'document_analyzer'
   Type: ChainOfThought
💰 ChainOfThought call #5: No cost data

📝 Analysis Results:
   Summary: The document highlights how AI is transforming industries by enhancing efficiency and decision-making while emphasizing the need for ethical considerations.
   Sentiment: Positive with caution
   Key Points: 3 found
💰 Predict call #5: No cost data

🎯 QA Module Result: The capital of Japan is Tokyo.


## 🏗️ Part 4: Production Patterns

Let's combine debugging, logging, and persistence into production-ready patterns.

In [13]:
# Production-ready module with full observability
class ProductionModule:
    """A production-ready wrapper for LogiLLM modules."""
    
    def __init__(self, module_name: str, signature, config: dict = None):
        self.module_name = module_name
        self.config = config or {}
        
        # Setup paths
        self.base_path = Path(self.config.get('base_path', './production'))
        self.base_path.mkdir(exist_ok=True)
        
        # Setup logging
        self.log_file = self.base_path / f"{module_name}.jsonl"
        self.callback_id = None
        
        # Create module with configuration
        module_class = self.config.get('module_class', Predict)
        self.module = module_class(
            signature,
            adapter=self.config.get('adapter', 'json'),
            debug=self.config.get('debug', False)
        )
        
        # Setup monitoring
        self._setup_monitoring()
        
    def _setup_monitoring(self):
        """Setup logging and callbacks."""
        # Register JSONL logger
        self.callback_id = register_jsonl_logger(
            str(self.log_file),
            include_module_events=True,
            include_provider_events=True
        )
        print(f"📝 Logging to: {self.log_file}")
        
    async def __call__(self, **kwargs):
        """Execute module with monitoring."""
        start_time = time.time()
        
        try:
            # Execute module
            result = await self.module(**kwargs)
            
            # Log success metrics
            duration = time.time() - start_time
            self._log_metric("success", 1, duration)
            
            return result
            
        except Exception as e:
            # Log failure
            duration = time.time() - start_time
            self._log_metric("failure", 1, duration)
            print(f"❌ Error in {self.module_name}: {e}")
            raise
    
    def _log_metric(self, metric_type: str, value: float, duration: float):
        """Log custom metrics."""
        metric = {
            "timestamp": time.time(),
            "module": self.module_name,
            "type": metric_type,
            "value": value,
            "duration": duration
        }
        # In production, send to metrics system
        print(f"📊 Metric: {metric_type}={value} ({duration:.2f}s)")
    
    def save(self):
        """Save module to disk."""
        save_path = self.base_path / f"{self.module_name}.pkl"
        with open(save_path, 'wb') as f:
            pickle.dump(self.module, f)
        print(f"💾 Saved to: {save_path}")
        return save_path
    
    def load(self):
        """Load module from disk."""
        load_path = self.base_path / f"{self.module_name}.pkl"
        if load_path.exists():
            with open(load_path, 'rb') as f:
                self.module = pickle.load(f)
            print(f"✅ Loaded from: {load_path}")
            return True
        return False
    
    def get_stats(self):
        """Get module statistics from logs."""
        if not self.log_file.exists():
            return {}
        
        stats = {
            "total_calls": 0,
            "total_tokens": 0,
            "total_cost": 0,
            "errors": 0
        }
        
        with open(self.log_file) as f:
            for line in f:
                event = json.loads(line)
                if event.get("event_type") == "module_end":
                    stats["total_calls"] += 1
                elif event.get("event_type") == "provider_response":
                    usage = event.get("usage", {})
                    stats["total_tokens"] += usage.get("total_tokens", 0)
                    stats["total_cost"] += event.get("cost", 0)
                elif event.get("event_type") == "error":
                    stats["errors"] += 1
        
        return stats

# Create a production module
prod_config = {
    'module_class': ChainOfThought,
    'adapter': 'json',
    'debug': True,  # Enable debug in development
    'base_path': TEMP_DIR / 'production'
}

prod_module = ProductionModule(
    "email_processor",
    DocumentAnalysis,
    config=prod_config
)

print("🚀 Production Module Ready!")
print(f"   Name: {prod_module.module_name}")
print(f"   Base Path: {prod_module.base_path}")

# Use the production module
email_text = """
Subject: Project Update

The development team has completed the API integration ahead of schedule.
All tests are passing and we're ready for deployment next week.
Please review the documentation and provide feedback by Friday.

Best regards,
Development Team
"""

result = await prod_module(document=email_text)

print(f"\n📧 Email Analysis:")
print(f"   Category: {result.outputs['category']}")
print(f"   Sentiment: {result.outputs['sentiment']}")

# Save the module
prod_module.save()

# Get statistics
stats = prod_module.get_stats()
print(f"\n📊 Module Statistics:")
for key, value in stats.items():
    print(f"   {key}: {value}")

📝 Logging to: /tmp/logillm_notebook_c36fth53/production/email_processor.jsonl
🚀 Production Module Ready!
   Name: email_processor
   Base Path: /tmp/logillm_notebook_c36fth53/production
💰 ChainOfThought call #6: No cost data
📊 Metric: success=1 (3.36s)

📧 Email Analysis:
   Category: Project Update
   Sentiment: Positive
💾 Saved to: /tmp/logillm_notebook_c36fth53/production/email_processor.pkl

📊 Module Statistics:
   total_calls: 1
   total_tokens: 336
   total_cost: 0
   errors: 0


## 🎯 Key Takeaways

<div style="background: #e8f5e9; padding: 25px; border-radius: 10px; margin: 20px 0;">
    <h3 style="margin-top: 0;">🔍 Debugging Best Practices</h3>
    <ul style="margin: 15px 0;">
        <li>Enable <code>debug=True</code> during development</li>
        <li>Access complete request/response data via <code>result.request</code> and <code>result.response</code></li>
        <li>Monitor token usage and costs with <code>result.response['usage']</code></li>
        <li>Use environment variable <code>LOGILLM_DEBUG=1</code> for global debugging</li>
    </ul>    
    <h3 style="margin-top: 20px;">📊 Logging & Monitoring</h3>
    <ul style="margin: 15px 0;">
        <li>Use JSONL logging for structured, parseable logs</li>
        <li>Create custom callbacks for specific monitoring needs</li>
        <li>Track costs, latency, and errors systematically</li>
        <li>Analyze logs to optimize performance and costs</li>
    </ul>    
    <h3 style="margin-top: 20px;">💾 Persistence Strategies</h3>
    <ul style="margin: 15px 0;">
        <li>Use pickle for simple, reliable module saving</li>
        <li>Create registries for managing multiple modules</li>
        <li>Save configured modules for consistent behavior</li>
        <li>Version your saved modules for reproducibility</li>
    </ul>    
    <h3 style="margin-top: 20px;">🏗️ Production Patterns</h3>
    <ul style="margin: 15px 0;">
        <li>Combine debugging, logging, and persistence</li>
        <li>Create wrapper classes for production modules</li>
        <li>Implement comprehensive error handling</li>
        <li>Monitor performance and costs continuously</li>
    </ul>
</div>

## 🎮 Interactive Exercise

Build your own monitored module with debugging, logging, and persistence!

In [14]:
# Exercise: Create a customer support ticket analyzer
# with full debugging, logging, and persistence

class SupportTicket(Signature):
    """Analyze customer support tickets."""
    ticket_text: str = InputField(desc="Support ticket content")
    customer_id: str = InputField(desc="Customer identifier")
    
    urgency: str = OutputField(desc="high, medium, or low")
    category: str = OutputField(desc="billing, technical, general")
    sentiment: str = OutputField(desc="positive, negative, neutral")
    needs_escalation: bool = OutputField(desc="Requires manager attention")
    suggested_response: str = OutputField(desc="Draft response to customer")

# TODO: Your implementation here!
# 1. Create a module with debug mode enabled
# 2. Setup JSONL logging
# 3. Create a custom callback to track urgent tickets
# 4. Save the configured module
# 5. Test with sample tickets

# Your code here:
ticket_analyzer = ChainOfThought(SupportTicket, adapter="json", debug=True)

# Setup logging
ticket_log = TEMP_DIR / "support_tickets.jsonl"
ticket_callback = register_jsonl_logger(str(ticket_log))

# Custom callback for urgent tickets
class UrgentTicketTracker(AbstractCallback):
    def __init__(self):
        self.urgent_tickets = []
    
    async def on_module_end(self, event):
        if hasattr(event, 'prediction') and event.prediction:
            outputs = event.prediction.outputs
            inputs = event.prediction.inputs if hasattr(event.prediction, 'inputs') else {}
            
            if outputs.get('urgency') == 'high' or outputs.get('needs_escalation'):
                customer_id = inputs.get('customer_id', 'Unknown')
                self.urgent_tickets.append({
                    'time': time.time(),
                    'customer': customer_id,
                    'urgency': outputs.get('urgency'),
                    'category': outputs.get('category')
                })
                print(f"🚨 URGENT: Customer {customer_id} - {outputs.get('category')}")

urgent_tracker = UrgentTicketTracker()
CallbackManager().register(urgent_tracker)

# Test tickets
test_tickets = [
    ("My account was charged twice! This is unacceptable! I want a refund NOW!", "CUST-001"),
    ("How do I update my email address in the system?", "CUST-002"),
    ("The app crashes every time I try to login. I've lost important data!", "CUST-003")
]

print("📧 Analyzing Support Tickets:")
print("=" * 60)

for ticket_text, customer_id in test_tickets:
    result = await ticket_analyzer(
        ticket_text=ticket_text,
        customer_id=customer_id
    )
    
    print(f"\n🎫 Ticket from {customer_id}:")
    print(f"   Urgency: {result.outputs['urgency']}")
    print(f"   Category: {result.outputs['category']}")
    print(f"   Escalate: {result.outputs['needs_escalation']}")
    print(f"   Response: {result.outputs['suggested_response'][:100]}...")

# Save the analyzer
save_path = TEMP_DIR / "ticket_analyzer.pkl"
with open(save_path, 'wb') as f:
    pickle.dump(ticket_analyzer, f)

print(f"\n💾 Saved analyzer to: {save_path}")
print(f"\n🚨 Urgent tickets found: {len(urgent_tracker.urgent_tickets)}")

📧 Analyzing Support Tickets:
💰 ChainOfThought call #7: No cost data
🚨 URGENT: Customer Unknown - billing

🎫 Ticket from CUST-001:
   Urgency: high
   Category: billing
   Escalate: False
   Response: We apologize for the inconvenience caused by the duplicate charge. We are reviewing your account and...
💰 ChainOfThought call #8: No cost data

🎫 Ticket from CUST-002:
   Urgency: low
   Category: general
   Escalate: False
   Response: Hello, thank you for reaching out. To update your email address in the system, please log into your ...
💰 ChainOfThought call #9: No cost data
🚨 URGENT: Customer Unknown - technical

🎫 Ticket from CUST-003:
   Urgency: high
   Category: technical
   Escalate: True
   Response: We are very sorry to hear about the issues you're experiencing with the app crashing upon login and ...

💾 Saved analyzer to: /tmp/logillm_notebook_c36fth53/ticket_analyzer.pkl

🚨 Urgent tickets found: 2


## 🏁 Progress Check

In [15]:
# Progress tracker
completed = {
    "debugging_basics": True,
    "debug_inspection": True,
    "debug_methods": True,
    "jsonl_logging": True,
    "custom_callbacks": True,
    "saving_modules": True,
    "loading_modules": True,
    "module_registry": True,
    "production_patterns": True,
    "exercise": True
}

total = len(completed)
done = sum(completed.values())
percentage = (done / total) * 100 if total > 0 else 0

print(f"📊 Notebook Progress: {done}/{total} sections ({percentage:.0f}%)")
print("\n" + "█" * int(percentage // 5) + "░" * (20 - int(percentage // 5)))

if percentage == 100:
    print("\n🎉 Excellent! You've mastered Debugging, Logging & Persistence!")
    print("\n🔑 Key skills acquired:")
    print("  • Debug LLM interactions effectively")
    print("  • Implement comprehensive logging")
    print("  • Create custom monitoring callbacks")
    print("  • Save and load configured modules")
    print("  • Build production-ready systems")
    print("\nReady for Notebook 5: Optimization!")

📊 Notebook Progress: 10/10 sections (100%)

████████████████████

🎉 Excellent! You've mastered Debugging, Logging & Persistence!

🔑 Key skills acquired:
  • Debug LLM interactions effectively
  • Implement comprehensive logging
  • Create custom monitoring callbacks
  • Save and load configured modules
  • Build production-ready systems

Ready for Notebook 5: Optimization!


<div style="display: flex; justify-content: space-between; margin-top: 40px; padding: 20px; background: #f5f5f5; border-radius: 10px;">
    <a href="03_modules_adapters.ipynb" style="text-decoration: none; padding: 10px 20px; background: white; border-radius: 5px; border: 1px solid #ddd;">← Notebook 3</a>
    <div style="text-align: center;">
        <strong>Congratulations! You're ready for production! 🚀</strong>
    </div>
    <a href="05_optimization.ipynb" style="text-decoration: none; padding: 10px 20px; background: #667eea; color: white; border-radius: 5px;">Continue to Notebook 5 →</a>
</div>