# Part 1B: Demos

Companion notebook with runnable demos for Part 1B concepts.

**Prerequisites:**
- Ollama installed (`brew install ollama` or https://ollama.ai)
- Model: `ollama pull qwen3:4b` (recommended, ~2.6GB)
- `pip install instructor outlines[ollama]`

**Demos in this notebook:**
1. Structured Output with Instructor + Ollama
2. Structured Generation with Outlines

**Guardrails demos → See `guardrails_demo.ipynb`** (separate notebook for dependency isolation)


## Demo 1: Structured Output with Instructor

Instructor wraps LLM clients to return validated Pydantic objects instead of raw text.

**Why this matters:**
- LLMs generate text; applications consume structured data
- Prompt-only approach: ~85% reliable (model sometimes adds explanations, breaks JSON)
- Instructor: ~95-99% reliable (auto-retries with validation feedback)


In [1]:
# STEP 1: Setup - Logging and Ollama check
import subprocess
import sys
import logging

# Configure logging for notebooks - color-coded levels
class ColoredFormatter(logging.Formatter):
    COLORS = {
        'DEBUG': '\033[90m',     # Gray
        'INFO': '\033[92m',      # Green
        'WARNING': '\033[93m',   # Yellow
        'ERROR': '\033[91m',     # Red
        'RESET': '\033[0m'
    }
    
    def format(self, record):
        color = self.COLORS.get(record.levelname, self.COLORS['RESET'])
        reset = self.COLORS['RESET']
        record.msg = f"{color}[{record.levelname}]{reset} {record.msg}"
        return super().format(record)

# Setup logger
logger = logging.getLogger("demos")
logger.setLevel(logging.DEBUG)
if not logger.handlers:
    handler = logging.StreamHandler()
    handler.setFormatter(ColoredFormatter('%(message)s'))
    logger.addHandler(handler)

def check_ollama():
    """Check if Ollama is running and has a model."""
    try:
        result = subprocess.run(
            ["ollama", "list"], 
            capture_output=True, 
            text=True, 
            timeout=5
        )
        if result.returncode == 0:
            models = result.stdout.strip()
            logger.info("Ollama is running")
            print(f"\nAvailable models:\n{models}")
            return True
        else:
            logger.error("Ollama not responding. Run: ollama serve")
            return False
    except FileNotFoundError:
        logger.error("Ollama not installed. Install: brew install ollama")
        return False
    except subprocess.TimeoutExpired:
        logger.error("Ollama timed out. Run: ollama serve")
        return False

ollama_ready = check_ollama()

if ollama_ready:
    print()
    logger.info("Recommended: ollama pull qwen3:4b   (~2.6GB, great for structured output)")
    logger.info("Smaller:     ollama pull qwen3:1.7b (~1.4GB, still good)")
    logger.info("Reliable:    ollama pull qwen3:8b   (~5GB, best quality)")


[92m[INFO][0m Ollama is running
[92m[INFO][0m Recommended: ollama pull qwen3:4b   (~2.6GB, great for structured output)
[92m[INFO][0m Smaller:     ollama pull qwen3:1.7b (~1.4GB, still good)
[92m[INFO][0m Reliable:    ollama pull qwen3:8b   (~5GB, best quality)



Available models:
NAME           ID              SIZE      MODIFIED   
qwen3:4b       359d7dd4bcda    2.5 GB    2 days ago    
llama3.2:1b    baf6a787fdff    1.3 GB    2 days ago



In [2]:
# STEP 2: Define Pydantic schema for structured extraction
from pydantic import BaseModel, Field, field_validator
from typing import List
from enum import Enum

class Priority(str, Enum):
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"

class SupportTicket(BaseModel):
    """Structured representation of a customer support ticket."""
    
    category: str = Field(description="Primary issue category (e.g., billing, technical, shipping)")
    priority: Priority = Field(description="Urgency level based on customer tone and issue severity")
    summary: str = Field(description="One-sentence summary of the issue", max_length=200)
    entities: List[str] = Field(
        default_factory=list,
        description="Products, order numbers, or features mentioned"
    )
    sentiment: float = Field(
        ge=-1.0, le=1.0,
        description="Sentiment from -1 (angry) to 1 (happy)"
    )

print("Schema defined: SupportTicket")
print(f"Fields: {list(SupportTicket.model_fields.keys())}")


Schema defined: SupportTicket
Fields: ['category', 'priority', 'summary', 'entities', 'sentiment']


In [3]:
# STEP 3: Initialize Instructor with Ollama
import instructor
from openai import OpenAI

# Ollama exposes OpenAI-compatible API at localhost:11434/v1
# This is why instructor.from_openai() works - same API shape
ollama_client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Ollama ignores this, but OpenAI client requires it
)

# Instructor wraps the client to add structured output capabilities
client = instructor.from_openai(
    ollama_client,
    mode=instructor.Mode.JSON  # JSON mode works best with Ollama
)

# Model selection - Qwen3 excels at instruction following and structured output
MODEL = "qwen3:4b"  # Recommended. Alternatives: qwen3:1.7b (smaller), qwen3:8b (better)

# Verify model availability
try:
    import subprocess
    result = subprocess.run(["ollama", "list"], capture_output=True, text=True)
    if MODEL.split(":")[0] in result.stdout:
        logger.info(f"Instructor client ready with model: {MODEL}")
    else:
        logger.warning(f"Model {MODEL} not found. Available models:")
        print(result.stdout)
        logger.info(f"Pull it with: ollama pull {MODEL}")
except Exception as e:
    logger.error(f"Could not verify model: {e}")


[92m[INFO][0m Instructor client ready with model: qwen3:4b


In [None]:
# STEP 4: Extract structured data from raw customer messages

# More explicit system prompt - critical for smaller models
SYSTEM_PROMPT = """You are a support ticket classifier. Extract information from customer messages.

RESPOND WITH JSON ONLY. No explanations. Use this exact structure:
{
  "category": "billing" or "technical" or "shipping" or "account" or "feedback",
  "priority": "high" or "medium" or "low",
  "summary": "one sentence describing the issue",
  "entities": ["list", "of", "mentioned", "items"],
  "sentiment": -1.0 to 1.0 (negative to positive)
}"""

def extract_ticket(raw_message: str) -> SupportTicket:
    """Extract structured ticket from raw customer message."""
    logger.debug(f"Extracting from: {raw_message[:50]}...")
    
    try:
        result = client.chat.completions.create(
            model=MODEL,
            response_model=SupportTicket,
            max_retries=3,  # More retries for smaller models
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": f"Customer message:\n{raw_message}"}
            ]
        )
        logger.info("Extraction successful")
        return result
    except Exception as e:
        logger.error(f"Extraction failed: {type(e).__name__}")
        raise

# Test messages
test_messages = [
    """
    I've been trying to reset my password for 3 days now!
    The mobile app keeps crashing when I tap "Forgot Password".
    This is ridiculous - I need access to my account for work.
    Order #12345 is stuck and I can't track it.
    """,
    
    """
    Hi, just wanted to say the new Pro subscription is amazing!
    The export feature saved me hours. Quick question - can I add 
    more team members to my account?
    """,
    
    """
    My invoice shows €299 but I was charged €399. Please fix this 
    immediately. Account: ACC-2024-789. This is the third billing 
    error this year.
    """
]

print("Extracting structured tickets from raw messages...")
print("=" * 65)

for i, msg in enumerate(test_messages, 1):
    print(f"\n[Message {i}]")
    print(f"Input: {msg.strip()[:70]}...")
    
    try:
        ticket = extract_ticket(msg)
        print(f"\n  ✓ Extracted SupportTicket:")
        print(f"    category:  {ticket.category}")
        print(f"    priority:  {ticket.priority.value}")
        print(f"    summary:   {ticket.summary[:60]}...")
        print(f"    entities:  {ticket.entities}")
        print(f"    sentiment: {ticket.sentiment:+.2f}")
    except Exception as e:
        logger.error(f"Failed: {e}")
    
    print("-" * 65)


[90m[DEBUG][0m Extracting from: 
    I've been trying to reset my password for 3 d...


Extracting structured tickets from raw messages...

[Message 1]
Input: I've been trying to reset my password for 3 days now!
    The mobile a...


In [None]:
# STEP 5: Instructor output demo - what you get back

sample_msg = "Cancel my subscription NOW. Order #99999 never arrived. This is fraud!"

logger.info("Extracting ticket from angry customer message...")
ticket = extract_ticket(sample_msg)

print("\n" + "=" * 65)
print("INSTRUCTOR GIVES YOU A PYDANTIC OBJECT, NOT A STRING")
print("=" * 65)

print("\n1. Type-safe attribute access:")
print(f"   ticket.category  → {ticket.category!r}")
print(f"   ticket.priority  → {ticket.priority}  (Enum, not string)")
print(f"   ticket.sentiment → {ticket.sentiment}  (float, constrained to [-1, 1])")
print(f"   ticket.entities  → {ticket.entities}  (List[str])")

print("\n2. Export to JSON (for APIs):")
print(ticket.model_dump_json(indent=2))

print("\n3. Export to dict (for databases):")
print(ticket.model_dump())

logger.info("This is what 'structured output' means - no JSON parsing, no try/except")


[92m[INFO][0m Extracting ticket from angry customer message...
[90m[DEBUG][0m Extracting from: Cancel my subscription NOW. Order #99999 never arr...
[92m[INFO][0m Extraction successful
[92m[INFO][0m This is what 'structured output' means - no JSON parsing, no try/except



INSTRUCTOR GIVES YOU A PYDANTIC OBJECT, NOT A STRING

1. Type-safe attribute access:
   ticket.category  → 'billing'
   ticket.priority  → Priority.HIGH  (Enum, not string)
   ticket.sentiment → -1.0  (float, constrained to [-1, 1])
   ticket.entities  → ['Order #99999']  (List[str])

2. Export to JSON (for APIs):
{
  "category": "billing",
  "priority": "high",
  "summary": "Customer wants to cancel subscription immediately as order #99999 never arrived and is fraudulent.",
  "entities": [
    "Order #99999"
  ],
  "sentiment": -1.0
}

3. Export to dict (for databases):
{'category': 'billing', 'priority': <Priority.HIGH: 'high'>, 'summary': 'Customer wants to cancel subscription immediately as order #99999 never arrived and is fraudulent.', 'entities': ['Order #99999'], 'sentiment': -1.0}


In [None]:
# STEP 6: Validation demo - what happens when LLM returns bad data

from pydantic import ValidationError

print("=" * 65)
print("PYDANTIC VALIDATION BASICS")
print("=" * 65)

# **bad_data is Python's dict unpacking syntax:
# SupportTicket(**{"a": 1, "b": 2}) == SupportTicket(a=1, b=2)
bad_data = {"category": "test", "priority": "urgent", "summary": "x", "sentiment": 5.0}

print(f"\nAttempting: SupportTicket(**bad_data)")
print(f"Where bad_data = {bad_data}")
print()

try:
    SupportTicket(**bad_data)  # ** unpacks dict as keyword arguments
except ValidationError as e:
    logger.error("Pydantic caught invalid data:")
    print(e)


[91m[ERROR][0m Pydantic caught invalid data:


PYDANTIC VALIDATION BASICS

Attempting: SupportTicket(**bad_data)
Where bad_data = {'category': 'test', 'priority': 'urgent', 'summary': 'x', 'sentiment': 5.0}

2 validation errors for SupportTicket
priority
  Input should be 'high', 'medium' or 'low' [type=enum, input_value='urgent', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/enum
sentiment
  Input should be less than or equal to 1 [type=less_than_equal, input_value=5.0, input_type=float]
    For further information visit https://errors.pydantic.dev/2.12/v/less_than_equal


In [None]:
# STEP 7: Demo - Instructor's retry mechanism with error feedback

# Create a STRICT schema - DON'T tell the model the valid categories!
class StrictTicket(BaseModel):
    """Schema with strict constraints to trigger validation failures."""
    category: str = Field(description="The issue category")  # Vague on purpose
    priority: Priority
    confidence: float = Field(ge=0.0, le=1.0, description="Model's confidence 0-1")
    
    @field_validator('category')
    @classmethod
    def validate_category(cls, v):
        # Very restrictive - only 3 allowed values
        allowed = ['billing', 'technical', 'shipping']
        if v.lower() not in allowed:
            raise ValueError(f"category must be one of {allowed}, got '{v}'")
        return v.lower()

# Track retries
retry_count = 0

def extract_with_retry_logging(message: str) -> StrictTicket:
    """Extract with visible retry tracking."""
    global retry_count
    retry_count = 0
    
    import functools
    original_create = client.chat.completions.create
    
    @functools.wraps(original_create)
    def tracked_create(*args, **kwargs):
        global retry_count
        retry_count += 1
        if retry_count > 1:
            logger.warning(f"Retry #{retry_count} - LLM correcting based on validation error")
        return original_create(*args, **kwargs)
    
    client.chat.completions.create = tracked_create
    
    try:
        # KEY: Don't tell the model the valid categories - let it fail naturally
        result = client.chat.completions.create(
            model=MODEL,
            response_model=StrictTicket,
            max_retries=4,
            messages=[
                {"role": "system", "content": "Extract ticket info as JSON. Classify the category."},
                {"role": "user", "content": message}
            ]
        )
        return result
    finally:
        client.chat.completions.create = original_create

print("=" * 65)
print("INSTRUCTOR RETRY MECHANISM - Error Feedback to LLM")
print("=" * 65)
print("""
Setup: Schema allows only ['billing', 'technical', 'shipping']
       BUT we don't tell the model this constraint!
       
When the model outputs an invalid category, Instructor:
1. Catches the ValidationError
2. Appends error to messages: "category must be one of [...], got 'account'"
3. Asks LLM to try again
4. LLM corrects its output
""")

# Messages that naturally map to invalid categories
test_cases = [
    "I love your product! Best purchase ever!",           # → "feedback" or "general" (invalid)
    "Please delete my account and all my data",           # → "account" (invalid)
    "I want to upgrade to the premium plan",              # → "sales" or "upgrade" (invalid)
    "My package arrived damaged",                          # → "shipping" (valid - should pass first try)
]

for msg in test_cases:
    logger.info(f"Testing: '{msg[:45]}...'")
    
    try:
        ticket = extract_with_retry_logging(msg)
        status = "✓ passed" if retry_count == 1 else f"✓ corrected after {retry_count} attempts"
        print(f"  {status} → category: {ticket.category}")
    except Exception as e:
        logger.error(f"Failed after max retries: {type(e).__name__}")
    print()

# Note on results
print("=" * 65)
print("NOTE: If all pass on first try, that's a GOOD sign!")
print("""
Better models = fewer retries needed. Qwen3:4b is smart enough to 
infer valid categories from context even without explicit constraints.

In production, this means:
- Strong models (GPT-4o, Claude Sonnet, Qwen3) rarely trigger retries
- Weaker models benefit more from Instructor's retry mechanism
- The retry logic is your safety net, not your primary path
""")


[92m[INFO][0m Testing: 'I love your product! Best purchase ever!...'


INSTRUCTOR RETRY MECHANISM - Error Feedback to LLM

Setup: Schema allows only ['billing', 'technical', 'shipping']
       BUT we don't tell the model this constraint!

When the model outputs an invalid category, Instructor:
1. Catches the ValidationError
2. Appends error to messages: "category must be one of [...], got 'account'"
3. Asks LLM to try again
4. LLM corrects its output



[92m[INFO][0m Testing: 'Please delete my account and all my data...'


  ✓ passed → category: technical



[92m[INFO][0m Testing: 'I want to upgrade to the premium plan...'


  ✓ passed → category: technical



[92m[INFO][0m Testing: 'My package arrived damaged...'


  ✓ passed → category: billing

  ✓ passed → category: shipping

NOTE: If all pass on first try, that's a GOOD sign!

Better models = fewer retries needed. Qwen3:4b is smart enough to 
infer valid categories from context even without explicit constraints.

In production, this means:
- Strong models (GPT-4o, Claude Sonnet, Qwen3) rarely trigger retries
- Weaker models benefit more from Instructor's retry mechanism
- The retry logic is your safety net, not your primary path



## Instructor Takeaways

**Instructor provides:**
- Pydantic schema → automatic prompt injection
- Response parsing with type coercion
- Validation with auto-retry on failure
- Works with any OpenAI-compatible API (OpenAI, Anthropic, Ollama, etc.)

**Cloud API setup** (when not using Ollama):
```python
from openai import OpenAI
import instructor

client = instructor.from_openai(OpenAI())  # Uses OPENAI_API_KEY env var
```


## Demo 2: Structured Generation with Outlines

Outlines is a structured generation library that works differently depending on the backend:
- **API backends (Ollama, OpenAI, vLLM server):** Uses provider's JSON mode
- **Local backends (HuggingFace, vLLM offline):** True token masking during generation

**Key insight:** Token masking (regex, grammar constraints) only works with LOCAL models!

**When to use Outlines:**
- Self-hosting models + need regex/grammar constraints
- High-volume GPU inference with vLLM offline
- Want unified API across local and cloud models


In [None]:
# STEP 8: Setup Outlines with Ollama (same model we used for Instructor!)
# pip install outlines[ollama]

print("=" * 65)
print("OUTLINES: Structured Generation Library")
print("=" * 65)
print("""
Outlines supports many backends with different capabilities:
  • API backends (Ollama, OpenAI, vLLM server): JSON schema via native API
  • Local backends (HuggingFace, vLLM offline): True token masking

With APIs, Outlines uses the provider's JSON mode - reliable but no regex.
With local models, Outlines masks tokens during sampling - full control.

We'll demo with Ollama (JSON mode) since we already have it running.
""")

import outlines
import ollama

# Outlines 1.2.x: Use Ollama (same model as Instructor demo!)
MODEL_NAME = "qwen3:4b"

logger.info(f"Connecting to Ollama model: {MODEL_NAME}")

model = None
try:
    # Create Ollama client, then wrap with Outlines
    ollama_client = ollama.Client()
    model = outlines.from_ollama(ollama_client, model_name=MODEL_NAME)
    logger.info("Ollama model connected!")
except Exception as e:
    logger.error(f"Failed to connect: {type(e).__name__}: {e}")
    print("\nTroubleshooting:")
    print("  1. pip install 'outlines[ollama]'")
    print("  2. Ensure Ollama is running: ollama serve")
    print(f"  3. Pull model if needed: ollama pull {MODEL_NAME}")


[92m[INFO][0m Connecting to Ollama model: qwen3:4b
[92m[INFO][0m Ollama model connected!


OUTLINES: Structured Generation Library

Outlines supports many backends with different capabilities:
  • API backends (Ollama, OpenAI, vLLM server): JSON schema via native API
  • Local backends (HuggingFace, vLLM offline): True token masking

With APIs, Outlines uses the provider's JSON mode - reliable but no regex.
With local models, Outlines masks tokens during sampling - full control.

We'll demo with Ollama (JSON mode) since we already have it running.



In [None]:
# STEP 9: Constrained generation with Pydantic schema

if model is not None:
    from pydantic import BaseModel as PydanticBase
    from typing import Literal
    
    # Define output schema as Pydantic model
    class TicketClassification(PydanticBase):
        category: Literal["billing", "technical", "shipping"]
        priority: int  # 1-5
        summary: str
    
    print("=" * 65)
    print("CONSTRAINED GENERATION RESULTS")
    print("=" * 65)
    print("Schema enforces: category ∈ {billing, technical, shipping}")
    print()
    
    # Test messages
    test_prompts = [
        "Classify this support ticket: My payment was declined twice and I'm locked out",
        "Classify this support ticket: Package never arrived, tracking shows delivered",
        "Classify this support ticket: App crashes when I try to export reports",
    ]
    
    for prompt in test_prompts:
        logger.info(f"Prompt: {prompt[:50]}...")
        
        try:
            # Outlines 1.2.x simple API: model(prompt, output_type)
            result = model(prompt, TicketClassification)
            print(f"  Output: {result}")
            print(f"  Type: {type(result).__name__}")
        except Exception as e:
            logger.error(f"Generation failed: {type(e).__name__}: {e}")
        print()
    
    print("=" * 65)
    print("NOTE: With Ollama, Outlines uses Ollama's JSON mode (not token masking).")
    print("Output is JSON string, not Pydantic object. Still reliable, but different mechanism.")
    print("For true grammar-level constraints, use HuggingFace transformers backend.")
else:
    logger.warning("Skipping demo - model not loaded")


[92m[INFO][0m Prompt: Classify this support ticket: My payment was decli...


CONSTRAINED GENERATION RESULTS
Schema enforces: category ∈ {billing, technical, shipping}



[92m[INFO][0m Prompt: Classify this support ticket: Package never arrive...


  Output: {
  "category": "shipping",
  "priority": 2,
  "summary": "Payment declined twice and locked out - urgent account access issue"
}


  Type: str



[92m[INFO][0m Prompt: Classify this support ticket: App crashes when I t...


  Output: {
  "category": "shipping",
  "priority": 2,
  "summary": "Package never arrived despite tracking showing delivery status as 'delivered' - tracking discrepancy detected"
}


  Type: str

  Output: {
  "category": "technical",
  "priority": 2,
  "summary": "App crashes when attempting to export reports (feature-specific crash during report export operation)"
}

  Type: str

NOTE: With Ollama, Outlines uses Ollama's JSON mode (not token masking).
Output is JSON string, not Pydantic object. Still reliable, but different mechanism.
For true grammar-level constraints, use HuggingFace transformers backend.


In [None]:
# STEP 10: Regex constraints - which backends support them?

print("=" * 65)
print("OUTLINES BACKEND CAPABILITIES")
print("=" * 65)
print("""
Regex/grammar constraints require token-level control during generation.
This depends on HOW you connect to the model:

┌────────────────────────────┬──────────────┬─────────────────┐
│ Backend                    │ JSON Schemas │ Regex/Grammar   │
├────────────────────────────┼──────────────┼─────────────────┤
│ Ollama (from_ollama)       │ ✓            │ ✗ (black-box)   │
│ OpenAI (from_openai)       │ ✓            │ ✗ (black-box)   │
│ vLLM server (from_vllm)    │ ✓            │ ✗ (API mode)    │
│ vLLM local (from_vllm_offline) │ ✓        │ ✓ Full support  │
│ HuggingFace (from_transformers)│ ✓        │ ✓ Full support  │
│ llama.cpp (from_llamacpp)  │ ✓            │ ✓ Full support  │
└────────────────────────────┴──────────────┴─────────────────┘

For production with full grammar control:
  # vLLM (offline mode - fastest for GPUs)
  from vllm import LLM
  model = outlines.from_vllm_offline(LLM("meta-llama/Llama-3-8B"))
  
  # HuggingFace (simpler setup)
  model = outlines.from_transformers(hf_model, tokenizer)
  
  regex_type = outlines.types.regex(r"PRD-[0-9]{3}")
  result = model("Generate code:", regex_type)  # Guaranteed PRD-XXX
""")


OUTLINES BACKEND CAPABILITIES

Regex/grammar constraints require token-level control during generation.
This depends on HOW you connect to the model:

┌────────────────────────────┬──────────────┬─────────────────┐
│ Backend                    │ JSON Schemas │ Regex/Grammar   │
├────────────────────────────┼──────────────┼─────────────────┤
│ Ollama (from_ollama)       │ ✓            │ ✗ (black-box)   │
│ OpenAI (from_openai)       │ ✓            │ ✗ (black-box)   │
│ vLLM server (from_vllm)    │ ✓            │ ✗ (API mode)    │
│ vLLM local (from_vllm_offline) │ ✓        │ ✓ Full support  │
│ HuggingFace (from_transformers)│ ✓        │ ✓ Full support  │
│ llama.cpp (from_llamacpp)  │ ✓            │ ✓ Full support  │
└────────────────────────────┴──────────────┴─────────────────┘

For production with full grammar control:
  # vLLM (offline mode - fastest for GPUs)
  from vllm import LLM
  model = outlines.from_vllm_offline(LLM("meta-llama/Llama-3-8B"))

  # HuggingFace (simpler setup)


In [None]:
# STEP 11: Comparison summary

print("=" * 65)
print("INSTRUCTOR vs OUTLINES: When to use which")
print("=" * 65)
print("""
┌──────────────────┬─────────────────────┬─────────────────────────────┐
│                  │ INSTRUCTOR          │ OUTLINES                    │
├──────────────────┼─────────────────────┼─────────────────────────────┤
│ Mechanism        │ Validate + retry    │ API: JSON mode              │
│                  │                     │ Local: Token masking        │
│ APIs (Ollama,    │ ✓ Full support      │ ✓ JSON schemas only         │
│   OpenAI, vLLM)  │                     │                             │
│ Local models     │ ✓ Via API wrapper   │ ✓ Full regex/grammar        │
│ (HF, vLLM offline)│                    │                             │
│ Returns          │ Pydantic object     │ API: str, Local: object     │
│ Retry on fail    │ ✓ With error context│ N/A (constraints prevent)   │
│ Custom validators│ ✓ Full Pydantic     │ Limited                     │
└──────────────────┴─────────────────────┴─────────────────────────────┘

DECISION GUIDE:
  • Using APIs (Ollama, OpenAI, vLLM server)? → Instructor (simpler DX)
  • Need Pydantic validators, retry with error feedback? → Instructor
  • Self-hosting + need regex/grammar constraints? → Outlines (local mode)
  • High-volume GPU inference? → Outlines + vLLM offline (fastest)
""")


INSTRUCTOR vs OUTLINES: When to use which

┌──────────────────┬─────────────────────┬─────────────────────────────┐
│                  │ INSTRUCTOR          │ OUTLINES                    │
├──────────────────┼─────────────────────┼─────────────────────────────┤
│ Mechanism        │ Validate + retry    │ API: JSON mode              │
│                  │                     │ Local: Token masking        │
│ APIs (Ollama,    │ ✓ Full support      │ ✓ JSON schemas only         │
│   OpenAI, vLLM)  │                     │                             │
│ Local models     │ ✓ Via API wrapper   │ ✓ Full regex/grammar        │
│ (HF, vLLM offline)│                    │                             │
│ Returns          │ Pydantic object     │ API: str, Local: object     │
│ Retry on fail    │ ✓ With error context│ N/A (constraints prevent)   │
│ Custom validators│ ✓ Full Pydantic     │ Limited                     │
└──────────────────┴─────────────────────┴─────────────────────────────┘

DECISIO

## Demos 3-4: Guardrails (Separate Notebook)

Guardrails demos have been moved to **`guardrails_demo.ipynb`** for:

- **Dependency isolation:** `guardrails-ai` requires `openai<2.0.0` (conflicts with Instructor)
- **Comprehensive coverage:** NeMo, Guardrails AI, and Haystack demos
- **Separate environment:** Can run in isolated env if needed

**See `1B/guardrails_demo.ipynb` for:**
1. NeMo Guardrails - Dialog flow with Colang
2. Guardrails AI - Field-level validation with Hub
3. Haystack - Pipeline-native components
4. Complete comparison and decision guide