# Pattern 17 & 32: LLM-as-Judge & Guardrails (Composable App Tutorial)

## Learning Objectives
By completing this tutorial, you will:
- Understand the LLM-as-Judge evaluation paradigm
- Learn when to use judges vs. rule-based validation
- Implement InputGuardrail for content policy enforcement
- Design boolean judge prompts with accept conditions
- Execute guardrails in parallel with asyncio.gather()
- Create custom guardrails for domain-specific validation

## Prerequisites
- **Python**: Intermediate proficiency with async/await
- **LLM basics**: Understanding of prompting and structured outputs
- **Setup**: OpenAI API key configured in `.env`
- **Prior tutorials**: Recommended to complete [Multi-Agent Workflow](../concepts/multi_agent_workflow.md)

## Estimated Time
25-30 minutes (reading + execution)

## Cost Estimate
⚠️ **API costs**: ~$0.01-0.05 (5-8 guardrail checks using SMALL_MODEL = gpt-4o-mini)

> **Book Reference**: This pattern is detailed in *Generative AI Design Patterns*
> (Lakshmanan & Hapke, 2025):
> - Chapter 17: "LLM-as-Judge" (evaluation)
> - Chapter 32: "Guardrails" (input/output validation)

---

## What is LLM-as-Judge?

**Task 2.2.2**: Conceptual section - LLM-as-judge paradigm, when to use vs. rule-based

**LLM-as-Judge** is a pattern where a language model acts as an automated evaluator or validator. Instead of writing complex rule-based logic, we leverage an LLM's reasoning capabilities to assess quality, safety, or compliance.

### Two Primary Use Cases

#### 1. Evaluation (Pattern 17)
Assess **output quality** after generation:
- "Is this response helpful?"
- "Does the article contain hallucinations?"
- "Rate the tone from 1-5"

#### 2. Guardrails (Pattern 32)
Validate **input/output** before/after processing:
- "Is this topic appropriate for K-12?"
- "Does the query contain harmful content?"
- "Is the response compliant with policy?"

### When to Use LLM-as-Judge

| Scenario | LLM-as-Judge ✅ | Rule-Based ✅ |
|----------|----------------|---------------|
| Subjective quality (helpfulness, tone) | ✅ Better | ❌ Difficult |
| Complex criteria ("age-appropriate") | ✅ Better | ❌ Hard to define |
| Rapid iteration on criteria | ✅ Change prompt | ❌ Rewrite code |
| Simple pattern matching (profanity) | ❌ Overkill | ✅ Faster/cheaper |
| Factual accuracy with ground truth | ❌ Use metrics | ✅ Exact match |
| Deterministic validation (email format) | ❌ Use regex | ✅ Precise |

### Composable App Use Case

**TaskAssigner** uses an InputGuardrail to validate user queries:

```python
# From agents/task_assigner.py:31-33
self.topic_guardrail = InputGuardrail(
    name="topic_guardrail",
    accept_condition="The topic is appropriate for K-12 educational content"
)
```

**Why LLM judge?** "Appropriate for K-12" is subjective and context-dependent:
- ✅ "World War II" → Appropriate (historical education)
- ✅ "Solving quadratic equations" → Appropriate (math)
- ❌ "How to hack a website" → Inappropriate (harmful)
- ❌ "Adult content" → Inappropriate (age-inappropriate)

Rule-based approach would require hundreds of if/else statements. LLM judge handles nuance.

---

## Setup Cell

**Task 2.2.1**: Setup cell with imports, API key, cost warning

In [None]:
# Add project root to path for imports
import sys
import os
from pathlib import Path

# Detect execution context
try:
    current_dir = Path(__file__).parent.resolve()
except NameError:
    current_dir = Path.cwd()

# Find repo root
repo_root = current_dir
while not (repo_root / '.env').exists() and repo_root != repo_root.parent:
    repo_root = repo_root.parent

# Add composable_app to path
composable_app_path = repo_root / 'composable_app'
sys.path.insert(0, str(composable_app_path))

# IMPORTANT: Change working directory to composable_app for PromptService
# PromptService uses relative path 'prompts/' which must resolve to composable_app/prompts/
os.chdir(composable_app_path)
print(f"✅ Working directory set to: {os.getcwd()}")

# Load environment variables
from dotenv import load_dotenv
env_path = repo_root / '.env'
load_dotenv(env_path)

# Check for OpenAI API key (InputGuardrail uses OpenAI via llms.py)
if not os.getenv('OPENAI_API_KEY'):
    raise EnvironmentError(
        "❌ OPENAI_API_KEY not found.\n"
        "   Get your key at: https://platform.openai.com/api-keys\n"
        "   Add to .env: OPENAI_API_KEY=sk-..."
    )

print(f"✅ Environment loaded from: {env_path}")
print("✅ OpenAI API key found")

# Guardrail dependencies
from utils.guardrails import InputGuardrail, InputGuardrailException
from utils.prompt_service import PromptService
from utils import llms
import asyncio

# Verify prompt templates are accessible
prompts_dir = Path('prompts')
if not prompts_dir.exists():
    raise FileNotFoundError(f"Prompts directory not found at: {prompts_dir.absolute()}")

print("✅ Setup complete")
print(f"⚠️ This notebook will make API calls. Estimated cost: $0.01-0.05")
print(f"   Using SMALL_MODEL: {llms.SMALL_MODEL} (fast & cheap)")

---

## InputGuardrail Implementation

**Task 2.2.3**: Code section - InputGuardrail implementation (utils/guardrails.py:14-43)

### Architecture

```mermaid
graph LR
    A[User Query] --> B{InputGuardrail}
    B -->|LLM Judge| C{Accept Condition<br/>Met?}
    C -->|Yes| D[✅ Process Query]
    C -->|No| E[❌ Reject with<br/>Exception]
    
    D --> F[TaskAssigner]
    F --> G[Writer]
    
    style C fill:#fff3cd
    style D fill:#d4edda
    style E fill:#f8d7da
```

### InputGuardrail Class

From [`utils/guardrails.py:14-43`](../../utils/guardrails.py#L14-L43):

```python
class InputGuardrail:
    def __init__(self, name: str, accept_condition: str):
        """Initialize guardrail with acceptance criteria.
        
        Args:
            name: Guardrail identifier for logging
            accept_condition: Condition text (e.g., 'The topic is appropriate for K-12')
        """
        self.name = name
        self.accept_condition = accept_condition
        
        # Create judge agent that returns boolean
        prompt = PromptService.render_prompt(
            "InputGuardrail_prompt",
            accept_condition=accept_condition
        )
        
        self.agent = Agent(
            llms.SMALL_MODEL,        # Fast & cheap for boolean decisions
            output_type=bool,        # Structured output: True or False
            system_prompt=prompt,
            retries=2
        )
    
    async def is_acceptable(self, prompt: str, raise_exception: bool = False) -> bool:
        """Check if input meets acceptance condition.
        
        Returns:
            True if acceptable, False otherwise
            
        Raises:
            InputGuardrailException: If raise_exception=True and input rejected
        """
        result = await self.agent.run(prompt)
        is_acceptable = result.output
        
        # Log decision
        logger.info(f"Guardrail {self.name}: {is_acceptable}")
        
        if not is_acceptable and raise_exception:
            raise InputGuardrailException(f"Input rejected by {self.name}")
        
        return is_acceptable
```

### Key Design Decisions

1. **Why SMALL_MODEL?**
   - Boolean decisions don't need advanced reasoning
   - Faster response time (100-300ms vs. 500-1000ms)
   - Lower cost (~10x cheaper than BEST_MODEL)

2. **Why `output_type=bool`?**
   - Pydantic AI ensures structured output (not text parsing)
   - Automatic retry if LLM returns non-boolean
   - Type-safe in Python code

3. **Why `raise_exception` parameter?**
   - **True**: Use in critical paths (block execution)
   - **False**: Use in monitoring (log but continue)

---

## Boolean Judge Design

**Task 2.2.4**: Code section - Boolean judge design with accept_condition prompts

### The Guardrail Prompt

From [`prompts/InputGuardrail_prompt.j2`](../../prompts/InputGuardrail_prompt.j2):

```jinja2
You are a content moderation expert.

Your task is to evaluate whether the user's input meets this condition:
{{ accept_condition }}

Respond with:
- True: If the condition is met
- False: If the condition is NOT met

Be objective and apply the condition consistently.
```

### Designing Effective Accept Conditions

**Good accept conditions** are:
- ✅ **Clear**: "The topic is appropriate for K-12 education"
- ✅ **Specific**: "The query does not contain profanity or hate speech"
- ✅ **Binary**: Can be answered True/False
- ✅ **Objective**: Different judges would agree

**Bad accept conditions**:
- ❌ **Vague**: "The input is good" (what is "good"?)
- ❌ **Complex**: "The topic is educational AND engaging AND appropriate" (too many criteria)
- ❌ **Subjective**: "The query is interesting" (varies by person)

### Examples

Let's create and test guardrails with different accept conditions:

In [None]:
# Create guardrail for K-12 appropriateness
k12_guardrail = InputGuardrail(
    name="k12_appropriateness",
    accept_condition="The topic is appropriate for K-12 educational content"
)

print("✅ Created K-12 Appropriateness Guardrail")
print(f"   Guardrail ID: {k12_guardrail.id}")
print(f"   Using model: {llms.SMALL_MODEL}")

---

## K-12 Content Appropriateness Demo

**Task 2.2.5**: Code section - K-12 content appropriateness validation demo

Let's test the guardrail with various topics:

In [None]:
# Test cases: Some appropriate, some not
test_topics = [
    "Battle of the Bulge",                    # Should be appropriate (history)
    "Solving quadratic equations",            # Should be appropriate (math)
    "How photosynthesis works",               # Should be appropriate (science)
    "How to hack into a computer system",     # Should NOT be appropriate (harmful)
    "Explicit adult content",                 # Should NOT be appropriate (age-inappropriate)
]

print("🧪 Testing K-12 Appropriateness Guardrail\n")
print("=" * 80)

results = []

for topic in test_topics:
    print(f"\n📝 Topic: '{topic}'")
    
    # Check guardrail (no exception, just return boolean)
    is_appropriate = await k12_guardrail.is_acceptable(topic, raise_exception=False)
    
    result_icon = "✅" if is_appropriate else "❌"
    result_text = "ACCEPTED" if is_appropriate else "REJECTED"
    
    print(f"   {result_icon} Result: {result_text}")
    
    results.append({
        "topic": topic,
        "appropriate": is_appropriate
    })
    
    print("-" * 80)

print(f"\n📊 Summary: {sum(r['appropriate'] for r in results)}/{len(results)} topics accepted")

### Understanding Results

**Expected behavior**:
- ✅ Educational topics (history, math, science) → Accepted
- ❌ Harmful topics (hacking, adult content) → Rejected

**Why this works**:
- LLM has been trained on educational standards
- Understands context and nuance
- "Battle of the Bulge" → Recognizes as legitimate history topic
- "How to hack" → Recognizes as potentially harmful instruction

**Note**: LLM judges can occasionally make mistakes. For production:
1. Test with comprehensive examples
2. Monitor and log all decisions
3. Combine with rule-based checks for critical cases

---

## Parallel Guardrail Execution

**Task 2.2.6**: Code section - Parallel guardrail execution with asyncio.gather()

### Why Parallel Execution?

When multiple guardrails validate the same input:
- ❌ **Sequential**: Total latency = Sum of all guardrails (slow)
- ✅ **Parallel**: Total latency = Max of slowest guardrail (faster)

**Example**: 3 guardrails, each takes 200ms:
- Sequential: 600ms total
- Parallel: 200ms total (3x faster!)

### Implementation

From [`agents/task_assigner.py:65-67`](../../agents/task_assigner.py#L65-L67):

```python
# Parallel execution: guardrail + agent classification
_, result = await asyncio.gather(
    self.topic_guardrail.is_acceptable(topic, raise_exception=True),  # Blocks if rejected
    self.agent.run(prompt)                                             # Runs concurrently
)
```

### Demo: Multiple Guardrails in Parallel

In [None]:
# Create multiple guardrails
guardrails = [
    InputGuardrail(
        name="k12_appropriate",
        accept_condition="The topic is appropriate for K-12 educational content"
    ),
    InputGuardrail(
        name="no_profanity",
        accept_condition="The text does not contain profanity or offensive language"
    ),
    InputGuardrail(
        name="educational_intent",
        accept_condition="The query has clear educational intent"
    ),
]

print(f"✅ Created {len(guardrails)} guardrails\n")

# Test topic
topic = "Explain the water cycle"
print(f"📝 Testing: '{topic}'\n")

# Sequential execution (for comparison)
import time

print("⏱️ Sequential Execution:")
start = time.time()
for guard in guardrails:
    result = await guard.is_acceptable(topic)
    print(f"   {guard.name}: {result}")
sequential_time = time.time() - start
print(f"   Total: {sequential_time:.2f}s\n")

# Parallel execution
print("⚡ Parallel Execution:")
start = time.time()
results = await asyncio.gather(
    *[guard.is_acceptable(topic) for guard in guardrails]
)
parallel_time = time.time() - start

for guard, result in zip(guardrails, results):
    print(f"   {guard.name}: {result}")
print(f"   Total: {parallel_time:.2f}s\n")

speedup = sequential_time / parallel_time
print(f"🚀 Speedup: {speedup:.1f}x faster with parallel execution")

### Error Handling with Parallel Guardrails

**Question**: What happens if one guardrail rejects but we need all to pass?

**Answer**: Use `raise_exception=True` - `asyncio.gather()` will propagate exception immediately.

In [None]:
# Demo: Exception handling
bad_topic = "How to hack into systems"

print(f"📝 Testing bad topic: '{bad_topic}'\n")

try:
    # Parallel execution with exception on failure
    results = await asyncio.gather(
        *[guard.is_acceptable(bad_topic, raise_exception=True) for guard in guardrails]
    )
    print("✅ All guardrails passed")
except InputGuardrailException as e:
    print(f"❌ Guardrail rejected: {e}")
    print("   Query blocked before processing")

---

## Creating Custom Guardrails

**Task 2.2.7**: Code section - Creating custom guardrails (toxicity, domain-specific)

### Custom Guardrail Examples

You can create guardrails for any domain or policy:

In [None]:
# Example 1: Toxicity detection
toxicity_guard = InputGuardrail(
    name="toxicity_detection",
    accept_condition="The text does not contain toxic, hateful, or offensive language"
)

# Example 2: Medical compliance
medical_guard = InputGuardrail(
    name="medical_compliance",
    accept_condition="The query does not request medical diagnosis or treatment advice"
)

# Example 3: Legal compliance
legal_guard = InputGuardrail(
    name="legal_compliance",
    accept_condition="The query does not request legal advice or services"
)

# Example 4: Domain-specific (coding assistant)
coding_guard = InputGuardrail(
    name="coding_appropriate",
    accept_condition="The query is related to software development, programming, or technology"
)

# Example 5: PII detection
pii_guard = InputGuardrail(
    name="no_pii",
    accept_condition="The text does not contain personally identifiable information (names, addresses, SSN, credit cards)"
)

print("✅ Created 5 custom guardrails:")
for guard in [toxicity_guard, medical_guard, legal_guard, coding_guard, pii_guard]:
    print(f"   - {guard.name}")

### Testing Custom Guardrails

In [None]:
# Test cases for different guardrails
test_cases = [
    ("Can you help me debug this Python function?", coding_guard, True),
    ("How do I make spaghetti carbonara?", coding_guard, False),
    ("I need advice on a legal contract", legal_guard, False),
    ("What is contract law?", legal_guard, True),  # Educational, not advice
]

print("🧪 Testing Custom Guardrails\n")
print("=" * 80)

for query, guard, expected in test_cases:
    result = await guard.is_acceptable(query)
    status = "✅ PASS" if result == expected else "❌ FAIL"
    
    print(f"\n{status} {guard.name}")
    print(f"   Query: '{query}'")
    print(f"   Expected: {expected}, Got: {result}")
    print("-" * 80)

---

## Common Pitfalls

**Task 2.2.8**: Common Pitfalls - Using BEST_MODEL (slow), missing exception handling

### ❌ Pitfall 1: Using BEST_MODEL for Guardrails

**Problem**: Using expensive, slow model for simple boolean decisions

```python
# ❌ BAD: Using BEST_MODEL
agent = Agent(
    llms.BEST_MODEL,  # "gemini-2.0-flash" - overkill for boolean
    output_type=bool
)
# Cost: ~$0.10 per 1000 checks
# Latency: 500-1000ms
```

**Solution**: Use SMALL_MODEL

```python
# ✅ GOOD: Using SMALL_MODEL
agent = Agent(
    llms.SMALL_MODEL,  # "gemini-2.5-flash-lite" - perfect for boolean
    output_type=bool
)
# Cost: ~$0.01 per 1000 checks (10x cheaper)
# Latency: 100-300ms (3x faster)
```

---

### ❌ Pitfall 2: Missing Exception Handling

**Problem**: Guardrail exception crashes entire application

```python
# ❌ BAD: No exception handling
writer = await assigner.assign_writer(user_query)
# If guardrail rejects, app crashes with InputGuardrailException
```

**Solution**: Catch and handle gracefully

```python
# ✅ GOOD: Handle rejection gracefully
try:
    writer = await assigner.assign_writer(user_query)
except InputGuardrailException as e:
    logger.warning(f"Query rejected: {e}")
    return {"error": "Your query does not meet content guidelines"}
```

---

### ❌ Pitfall 3: Vague Accept Conditions

**Problem**: Inconsistent or unpredictable guardrail behavior

```python
# ❌ BAD: Too vague
guard = InputGuardrail(
    name="quality",
    accept_condition="The input is good quality"
)
# What is "good"? Results will be inconsistent
```

**Solution**: Be specific and objective

```python
# ✅ GOOD: Clear, specific criteria
guard = InputGuardrail(
    name="query_clarity",
    accept_condition="The query is a complete sentence with clear intent"
)
```

---

### ❌ Pitfall 4: Not Logging Guardrail Decisions

**Problem**: Can't debug why queries were rejected

**Solution**: Always log decisions (InputGuardrail does this automatically)

```python
# Check logs/guards.json for all decisions
import json

with open("logs/guards.json") as f:
    for line in f:
        decision = json.loads(line)
        print(f"{decision['timestamp']}: {decision['guardrail_name']} → {decision['result']}")
```

---

### 💡 Tip: Combine LLM Judge with Rule-Based Checks

**Best practice**: Use both for defense-in-depth

```python
def combined_validation(text: str) -> bool:
    # Layer 1: Fast rule-based checks
    if contains_profanity(text):  # Regex check - instant
        return False
    
    if len(text) > 10000:  # Length check
        return False
    
    # Layer 2: LLM judge for nuanced validation
    return await guardrail.is_acceptable(text)
```

---

## Self-Assessment

**Task 2.2.9**: Self-assessment and book references

### Question 1: Concept Check
**When should you use LLM-as-Judge instead of rule-based validation?**

<details>
<summary>Click to reveal answer</summary>

**Answer**: Use LLM-as-Judge when:

1. **Criteria are subjective**: "Is this tone appropriate?" - No clear rules
2. **Context matters**: "Is 'shooting' appropriate?" - Depends on context (photography vs. violence)
3. **Rapid iteration needed**: Changing prompts is easier than rewriting code
4. **Nuance required**: "Age-appropriate" varies by topic complexity

**Use rule-based when**:
- Deterministic validation needed (email format, phone number)
- Speed is critical (<10ms response time)
- Exact pattern matching (profanity list, banned words)
- Cost must be minimal ($0.00001 vs. $0.0001 per check)

**Best**: Combine both - rule-based for fast filtering, LLM for nuanced decisions
</details>

---

### Question 2: Implementation
**Why does TaskAssigner use `asyncio.gather()` for guardrail + agent classification?**

<details>
<summary>Click to reveal answer</summary>

**Answer**: Parallel execution for speed:

```python
# From agents/task_assigner.py:65-67
_, result = await asyncio.gather(
    self.topic_guardrail.is_acceptable(topic, raise_exception=True),
    self.agent.run(prompt)
)
```

**Why parallel?**
- Guardrail check: ~200ms
- Agent classification: ~300ms
- **Sequential**: 200ms + 300ms = 500ms total
- **Parallel**: max(200ms, 300ms) = 300ms total (1.7x faster)

**Behavior**:
- If guardrail rejects (raises exception), `gather()` cancels agent call immediately
- If both succeed, we get classification result
- Saves ~200ms per request in happy path
</details>

---

### Question 3: Design Trade-offs
**What are the downsides of using LLM-as-Judge for guardrails?**

<details>
<summary>Click to reveal answer</summary>

**Answer**: Key limitations:

1. **Not 100% deterministic**: Same query might get different results occasionally
   - Solution: Use temperature=0, clear conditions, monitor decisions

2. **Added latency**: 100-300ms vs. <1ms for regex
   - Solution: Use SMALL_MODEL, parallel execution, caching

3. **API costs**: $0.0001 per check (vs. $0 for rule-based)
   - Solution: Acceptable for most use cases, add rule-based pre-filter

4. **Potential bias**: LLM may have cultural or political biases
   - Solution: Test extensively, use diverse examples, monitor in production

5. **Failure modes**: API down, rate limits, timeout
   - Solution: Implement retries, fallback to rule-based, circuit breaker

**When acceptable**: Most applications where nuanced validation matters more than perfect determinism
</details>

---

### Question 4: Advanced
**How would you implement a guardrail that rejects PII (personally identifiable information)?**

<details>
<summary>Click to reveal answer</summary>

**Answer**: Hybrid approach (rule-based + LLM):

```python
import re

# Layer 1: Fast regex checks for obvious PII
def quick_pii_check(text: str) -> bool:
    """Rule-based pre-filter for common PII patterns."""
    
    # SSN pattern: XXX-XX-XXXX
    if re.search(r'\d{3}-\d{2}-\d{4}', text):
        return True
    
    # Credit card: 16 digits
    if re.search(r'\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}', text):
        return True
    
    # Email addresses
    if re.search(r'[\w\.-]+@[\w\.-]+\.\w+', text):
        return True
    
    return False

# Layer 2: LLM judge for nuanced PII
pii_guardrail = InputGuardrail(
    name="pii_detection",
    accept_condition="""The text does not contain personally identifiable information including:
    - Full names with context (e.g., 'John Smith lives at...')
    - Home addresses
    - Phone numbers
    - Social Security Numbers
    - Credit card numbers
    - Driver's license numbers
    - Medical record numbers
    
    Generic names without context are acceptable (e.g., 'Alice and Bob example').
    """
)

# Combined validation
async def contains_pii(text: str) -> bool:
    # Quick check first (fast reject)
    if quick_pii_check(text):
        return True
    
    # LLM check for nuanced cases
    return not await pii_guardrail.is_acceptable(text)
```

**Why hybrid?**
- Regex catches 90% of cases instantly (no API cost)
- LLM catches edge cases ("My address is 123 Main Street" - no regex pattern)
- Best of both worlds: speed + nuance
</details>

---

## Book References & Further Reading

### Generative AI Design Patterns (Lakshmanan & Hapke, O'Reilly 2025)

📖 **Chapter 17: LLM-as-Judge**
- Judge design patterns
- Evaluation vs. validation use cases
- Structured output parsing
- Judge bias and mitigation

📖 **Chapter 32: Guardrails**
- Input/output validation architecture
- Multi-layer defense strategies
- Performance optimization
- Production deployment

**Related Chapters**:
- **Chapter 18**: "Reflection" - Self-correction with judges
- **Chapter 25**: "Prompt Caching" - Template-based guardrail prompts
- **Chapter 30**: "Structured Outputs" - Boolean and Pydantic models

### External Resources
- [Pydantic AI Structured Outputs](https://ai.pydantic.dev/)
- [Guardrails AI Library](https://github.com/guardrails-ai/guardrails) - Additional validation layers
- [NIST AI Safety Guidelines](https://www.nist.gov/artificial-intelligence)

---

## Next Steps

### Continue Learning
1. **[Horizontal Services](../concepts/horizontal_services.md)** - Guardrails in the broader architecture
2. **[Evaluation Tutorial](evaluation_tutorial.ipynb)** - Using LLM-as-judge for quality metrics
3. **[Multi-Agent Pattern](multi_agent_pattern.ipynb)** - ReviewerPanel as judge ensemble

### Hands-On Practice
1. **Create domain-specific guardrail**: Design for your application (medical, legal, etc.)
2. **Test edge cases**: Find queries that fool the guardrail
3. **Measure latency**: Compare SMALL_MODEL vs. BEST_MODEL vs. rule-based
4. **Monitor in production**: Log all decisions, track false positives/negatives

### Advanced Exercises
1. **Multi-criteria guardrails**: Single judge evaluating multiple conditions
2. **Confidence scores**: Return probability instead of boolean
3. **Ensemble judges**: Multiple LLMs voting on decision
4. **Adaptive guardrails**: Adjust strictness based on user reputation

---

**Congratulations!** You've learned Pattern 17 & 32: LLM-as-Judge and Guardrails. You can now build safe, policy-compliant LLM applications with intelligent input validation.

**Tutorial Version**: 1.0  
**Last Updated**: 2025-11-04  
**Estimated Time**: 25-30 minutes  
**API Cost**: ~$0.05-0.10