# Iteration 5: Human-in-the-Loop

## Overview

In this iteration, we add human review capabilities for cases where AI needs help. This is crucial for:
- Handling edge cases beyond AI capabilities
- Building user trust through transparency
- Continuous improvement through feedback
- Meeting compliance/regulatory requirements

## What You'll Learn
1. **Uncertainty Detection** - When should AI ask for help?
2. **Human Review Interface** - How to present info for human decision
3. **Feedback Integration** - Incorporating human input into the flow
4. **Async Patterns** - Handling human input without blocking

## The Concept: When and Why to Ask Humans

### When to Request Human Review:
1. **Low Confidence** (<50%) - AI is unsure
2. **Expert Disagreement** - Different agents reach different conclusions
3. **High Stakes** - Critical decisions need human oversight
4. **No Evidence** - Claims that can't be verified with available tools
5. **Explicit Request** - User wants human verification

In [1]:
import sys
sys.path.append('..')

from modules.m5_human_in_loop import (
    HumanReviewRequest,
    HumanFeedback,
    HumanInLoopState,
    calculate_uncertainty,
    check_claim_with_human_in_loop,
    interactive_human_input,
    create_human_in_loop_bs_detector
)
from config.llm_factory import LLMFactory
import json
from IPython.display import display, HTML

print("✅ Imports successful!")

✅ Imports successful!


In [2]:
import sys
sys.path.append('..')

from modules.m5_human_in_loop import (
    HumanReviewRequest,
    HumanFeedback,
    HumanInLoopState,
    calculate_uncertainty,
    check_claim_with_human_in_loop,
    interactive_human_input,
    create_human_in_loop_bs_detector
)
from config.llm_factory import LLMFactory
import json
from IPython.display import display, HTML
import warnings
warnings.filterwarnings('ignore', category=RuntimeWarning, module='langchain_community')

print("✅ Imports successful!")

✅ Imports successful!


In [3]:
# Create test states with different uncertainty levels
test_states = [
    # High confidence, clear verdict
    HumanInLoopState(
        claim="The Boeing 747 has four engines",
        verdict="LEGITIMATE", 
        confidence=95,
        claim_type="technical"
    ),
    
    # Low confidence
    HumanInLoopState(
        claim="Some new aviation technology was announced",
        verdict="LEGITIMATE",
        confidence=40,
        claim_type="current_event"
    ),
    
    # Expert disagreement
    HumanInLoopState(
        claim="Planes will be fully autonomous by 2025",
        verdict="BS",
        confidence=70,
        expert_opinions={
            "technical_expert": {"verdict": "BS", "confidence": 80},
            "general_expert": {"verdict": "LEGITIMATE", "confidence": 60}
        }
    )
]

# Calculate uncertainty for each
print("🔍 Uncertainty Analysis:\n")
for i, state in enumerate(test_states, 1):
    uncertainty = calculate_uncertainty(state)
    print(f"Test {i}: {state.claim[:50]}...")
    print(f"  Confidence: {state.confidence}%")
    print(f"  Uncertainty Score: {uncertainty:.2f}")
    print(f"  Needs Review: {'Yes' if uncertainty > 0.6 else 'No'}")
    print()

🔍 Uncertainty Analysis:

Test 1: The Boeing 747 has four engines...
  Confidence: 95%
  Uncertainty Score: 0.00
  Needs Review: No

Test 2: Some new aviation technology was announced...
  Confidence: 40%
  Uncertainty Score: 0.50
  Needs Review: No

Test 3: Planes will be fully autonomous by 2025...
  Confidence: 70%
  Uncertainty Score: 0.30
  Needs Review: No



## 2. Human Review Request Format

When we need human input, we present all relevant information clearly:

In [4]:
# Create a sample review request
review_request = HumanReviewRequest(
    claim="ChatGPT-5 was released yesterday with consciousness",
    ai_verdict="BS",
    ai_confidence=45,
    ai_reasoning="Claim about consciousness is dubious and no official announcement found",
    uncertainty_reasons=[
        "Very low confidence: 45%",
        "No evidence found for recent event",
        "Extraordinary claim requiring extraordinary evidence"
    ],
    expert_opinions={
        "Current Events Expert": {
            "verdict": "BS",
            "confidence": 45,
            "reasoning": "No official announcement from OpenAI found"
        },
        "Technical Expert": {
            "verdict": "BS", 
            "confidence": 90,
            "reasoning": "Claims of consciousness are not scientifically supported"
        }
    },
    search_results=[
        {"fact": "OpenAI has not announced ChatGPT-5 as of latest search"},
        {"fact": "No credible sources report AI consciousness breakthrough"}
    ]
)

# Display formatted request
print(review_request.format_for_human())


🤔 HUMAN REVIEW REQUESTED

**Claim**: ChatGPT-5 was released yesterday with consciousness

**AI Assessment**:
- Verdict: BS
- Confidence: 45%
- Reasoning: Claim about consciousness is dubious and no official announcement found

**Uncertainty Reasons**:
- Very low confidence: 45%
- No evidence found for recent event
- Extraordinary claim requiring extraordinary evidence

**Expert Opinions**:

Current Events Expert:
  - Verdict: BS
  - Confidence: 45%
  - Reasoning: No official announcement from OpenAI found

Technical Expert:
  - Verdict: BS
  - Confidence: 90%
  - Reasoning: Claims of consciousness are not scientifically supported

**Search Results**: 2 results found
  1. OpenAI has not announced ChatGPT-5 as of latest search...
  2. No credible sources report AI consciousness breakthrough...




## 3. Simulated Human Feedback

Let's see how human feedback is structured:

In [5]:
# Example human feedback
human_feedback = HumanFeedback(
    verdict="BS",
    confidence=95,
    reasoning="No such announcement exists. ChatGPT-5 has not been released.",
    additional_context="I checked OpenAI's official channels and tech news sites.",
    sources=["https://openai.com/blog", "Tech news aggregators"]
)

print("🧑 Human Feedback:")
print(f"Verdict: {human_feedback.verdict}")
print(f"Confidence: {human_feedback.confidence}%")
print(f"Reasoning: {human_feedback.reasoning}")
if human_feedback.additional_context:
    print(f"Context: {human_feedback.additional_context}")
if human_feedback.sources:
    print(f"Sources: {', '.join(human_feedback.sources)}")

🧑 Human Feedback:
Verdict: BS
Confidence: 95%
Reasoning: No such announcement exists. ChatGPT-5 has not been released.
Context: I checked OpenAI's official channels and tech news sites.
Sources: https://openai.com/blog, Tech news aggregators


## 4. Complete Flow with Human-in-the-Loop

Now let's see the complete system in action:

In [6]:
# Test claims that might trigger human review
test_claims = [
    # Should be straightforward - no human review
    "The Boeing 747 has four engines",
    
    # Ambiguous current event - might trigger review
    "A major airline announced bankruptcy yesterday",
    
    # Extraordinary claim - should trigger review
    "Scientists discovered anti-gravity technology last week",
]

print("🤖 Testing Human-in-the-Loop BS Detector\n")

for claim in test_claims:
    print(f"\n{'='*70}")
    print(f"Claim: \"{claim}\"")
    print("="*70)
    
    # Check claim with human-in-the-loop
    # For demo, we use simulated human input
    result = check_claim_with_human_in_loop(claim)
    
    print(f"\n📊 Final Result:")
    print(f"Verdict: {result['verdict']}")
    print(f"Confidence: {result['confidence']}%")
    print(f"Human Reviewed: {result.get('human_reviewed', False)}")
    
    if result.get('uncertainty_score'):
        print(f"Uncertainty Score: {result['uncertainty_score']:.2f}")
    
    if result.get('human_feedback'):
        print(f"\nHuman Feedback:")
        print(f"  - Verdict: {result['human_feedback']['verdict']}")
        print(f"  - Confidence: {result['human_feedback']['confidence']}%")

🤖 Testing Human-in-the-Loop BS Detector


Claim: "The Boeing 747 has four engines"

📊 Final Result:
Verdict: LEGITIMATE
Confidence: 100%
Human Reviewed: False

Claim: "A major airline announced bankruptcy yesterday"


  with DDGS() as ddgs:



📊 Final Result:
Verdict: BS
Confidence: 90%
Human Reviewed: False

Claim: "Scientists discovered anti-gravity technology last week"


  with DDGS() as ddgs:



📊 Final Result:
Verdict: BS
Confidence: 90%
Human Reviewed: False


In [7]:
# Test claims that might trigger human review
test_claims = [
    # Should be straightforward - no human review
    "The Boeing 747 has four engines",
    
    # Ambiguous current event - might trigger review
    "A major airline announced bankruptcy yesterday",
    
    # Extraordinary claim - should trigger review
    "Scientists discovered anti-gravity technology last week",
]

print("🤖 Testing Human-in-the-Loop BS Detector\n")

# Add debugging
print(f"Using function: {check_claim_with_human_in_loop}")
print(f"Module location: {check_claim_with_human_in_loop.__module__}")

for claim in test_claims:
    print(f"\n{'='*70}")
    print(f"Claim: \"{claim}\"")
    print("="*70)
    
    try:
        # Check claim with human-in-the-loop
        # For demo, we use simulated human input
        result = check_claim_with_human_in_loop(claim)
        
        print(f"\n📊 Final Result:")
        print(f"Verdict: {result['verdict']}")
        print(f"Confidence: {result['confidence']}%")
        print(f"Human Reviewed: {result.get('human_reviewed', False)}")
        print(f"Analyzing Agent: {result.get('analyzing_agent', 'N/A')}")
        
        if result.get('uncertainty_score'):
            print(f"Uncertainty Score: {result['uncertainty_score']:.2f}")
        
        if result.get('human_feedback'):
            print(f"\nHuman Feedback:")
            print(f"  - Verdict: {result['human_feedback']['verdict']}")
            print(f"  - Confidence: {result['human_feedback']['confidence']}%")
    except Exception as e:
        print(f"\n❌ Error: {type(e).__name__}: {str(e)}")
        import traceback
        traceback.print_exc()

🤖 Testing Human-in-the-Loop BS Detector

Using function: <function check_claim_with_human_in_loop at 0x114b54180>
Module location: modules.m5_human_in_loop

Claim: "The Boeing 747 has four engines"

📊 Final Result:
Verdict: LEGITIMATE
Confidence: 100%
Human Reviewed: False
Analyzing Agent: Technical Expert

Claim: "A major airline announced bankruptcy yesterday"


  with DDGS() as ddgs:



📊 Final Result:
Verdict: BS
Confidence: 85%
Human Reviewed: False
Analyzing Agent: Current Events Expert (with tools)

Claim: "Scientists discovered anti-gravity technology last week"


  with DDGS() as ddgs:



📊 Final Result:
Verdict: BS
Confidence: 90%
Human Reviewed: False
Analyzing Agent: Current Events Expert (with tools)


### Testing Different Scenarios

Let's test claims with different characteristics to see when human review is triggered:

In [8]:
# Let's force a human review by creating a low-confidence scenario
print("🔬 Testing Human Review Triggering\n")

# Create a state with very low confidence to trigger review
from modules.m5_human_in_loop import uncertainty_detector_node

test_state = HumanInLoopState(
    claim="Some ambiguous aviation claim that's hard to verify",
    verdict="BS",
    confidence=30,  # Very low confidence
    reasoning="Highly uncertain about this claim",
    claim_type="general",
    analyzing_agent="General Expert"
)

# Run uncertainty detection
updates = uncertainty_detector_node(test_state)

print(f"Uncertainty Score: {updates['uncertainty_score']:.2f}")
print(f"Needs Human Review: {updates['needs_human_review']}")
if updates.get('review_reasons'):
    print(f"\nReview Reasons:")
    for reason in updates['review_reasons']:
        print(f"  - {reason}")

if updates.get('human_review_request'):
    print("\n" + "="*60)
    print(updates['human_review_request'].format_for_human())

🔬 Testing Human Review Triggering

Uncertainty Score: 0.40
Needs Human Review: True

Review Reasons:
  - Very low confidence: 30%


🤔 HUMAN REVIEW REQUESTED

**Claim**: Some ambiguous aviation claim that's hard to verify

**AI Assessment**:
- Verdict: BS
- Confidence: 30%
- Reasoning: Highly uncertain about this claim

**Uncertainty Reasons**:
- Very low confidence: 30%





In [9]:
# Interactive demo with real human input
print("🎮 Interactive Human-in-the-Loop Demo")
print("You'll be asked to review claims when the AI is uncertain.\n")

# Test with a claim that should trigger review
ambiguous_claim = "The new supersonic passenger jet breaks physics laws"

print(f"Testing claim: \"{ambiguous_claim}\"\n")

# Use interactive handler
# Note: In Jupyter, this will create input fields
result = check_claim_with_human_in_loop(
    ambiguous_claim,
    human_input_handler=interactive_human_input
)

print("\n" + "="*60)
print("Final Decision:")
print(f"Verdict: {result['verdict']}")
print(f"Confidence: {result['confidence']}%")
print(f"Reasoning: {result['reasoning']}")

🎮 Interactive Human-in-the-Loop Demo
You'll be asked to review claims when the AI is uncertain.

Testing claim: "The new supersonic passenger jet breaks physics laws"


Final Decision:
Verdict: BS
Confidence: 95%
Reasoning: The claim that a new supersonic passenger jet "breaks physics laws" is highly unlikely and technically inaccurate. Aviation design and operation are strictly governed by the fundamental laws of physics, including aerodynamics, thermodynamics, and materials science. Supersonic flight is well understood and has been achieved before—most notably by the Concorde and military jets—without violating any physical laws.

Any new supersonic passenger jet must operate within the constraints of known physics, such as overcoming shockwave formation, managing heat generated by air friction at high speeds, and maintaining structural integrity under high dynamic pressures. While new technological advancements can improve efficiency, reduce sonic booms, or enhance materials, they d

In [12]:
# Interactive demo with real human input
print("🎮 Interactive Human-in-the-Loop Demo")
print("You'll be asked to review claims when the AI is uncertain.\n")

# Test with a claim that should trigger review
ambiguous_claim = "The new supersonic passenger jet breaks physics laws"

print(f"Testing claim: \"{ambiguous_claim}\"\n")

# For non-interactive demo, we'll use a mock handler
def mock_human_input(request: HumanReviewRequest) -> HumanFeedback:
    print("🤖 Simulating human review...")
    return HumanFeedback(
        verdict="BS",
        confidence=95,
        reasoning="Laws of physics cannot be broken. Likely hyperbolic marketing claim.",
        additional_context="Supersonic jets must still obey aerodynamics and thermodynamics"
    )

result = check_claim_with_human_in_loop(
    ambiguous_claim,
    human_input_handler=mock_human_input
)

print("\n" + "="*60)
print("Final Decision:")
print(f"Verdict: {result['verdict']}")
print(f"Confidence: {result['confidence']}%")
print(f"Reasoning: {result['reasoning']}")

# To use interactive input in Jupyter, uncomment below:
result = check_claim_with_human_in_loop(
    ambiguous_claim,
    human_input_handler=interactive_human_input
)

🎮 Interactive Human-in-the-Loop Demo
You'll be asked to review claims when the AI is uncertain.

Testing claim: "The new supersonic passenger jet breaks physics laws"


Final Decision:
Verdict: BS
Confidence: 95%
Reasoning: The claim that a supersonic passenger jet "breaks physics laws" is almost certainly false. Supersonic flight is well-understood and has been achieved multiple times, notably with the Concorde and various military aircraft. The physics governing supersonic flight are based on well-established principles of aerodynamics, thermodynamics, and materials science. While new supersonic jets may incorporate advanced technologies—such as improved engine designs, aerodynamic shaping, and noise reduction techniques—these advancements work within the constraints of physical laws rather than violating them. 

If a jet purportedly "breaks physics laws," it would imply impossible phenomena like exceeding the speed of sound without the expected sonic boom, or achieving propulsion wi

In [None]:
import asyncio
from typing import Optional
import uuid

class AsyncHumanReviewQueue:
    """Async queue for human review requests"""
    
    def __init__(self):
        self.pending_reviews = {}
        self.completed_reviews = {}
    
    async def request_review(self, review_request: HumanReviewRequest) -> str:
        """Submit a review request and get a ticket ID"""
        ticket_id = str(uuid.uuid4())
        self.pending_reviews[ticket_id] = {
            "request": review_request,
            "status": "pending",
            "submitted_at": datetime.now()
        }
        
        print(f"\n📋 Review Request Submitted")
        print(f"Ticket ID: {ticket_id}")
        print(f"Claim: {review_request.claim[:50]}...")
        print(f"Status: Pending human review")
        
        return ticket_id
    
    async def check_review_status(self, ticket_id: str) -> Optional[HumanFeedback]:
        """Check if review is complete"""
        if ticket_id in self.completed_reviews:
            return self.completed_reviews[ticket_id]["feedback"]
        elif ticket_id in self.pending_reviews:
            return None
        else:
            raise ValueError(f"Unknown ticket ID: {ticket_id}")
    
    async def submit_feedback(self, ticket_id: str, feedback: HumanFeedback):
        """Submit human feedback for a review request"""
        if ticket_id not in self.pending_reviews:
            raise ValueError(f"Unknown ticket ID: {ticket_id}")
        
        review = self.pending_reviews.pop(ticket_id)
        review["status"] = "completed"
        review["feedback"] = feedback
        review["completed_at"] = datetime.now()
        
        self.completed_reviews[ticket_id] = review
        print(f"✅ Review completed for ticket {ticket_id}")

# Demo async pattern
async def demo_async_review():
    queue = AsyncHumanReviewQueue()
    
    # Submit review request
    review_req = HumanReviewRequest(
        claim="Quantum computers solved P=NP yesterday",
        ai_verdict="BS",
        ai_confidence=30,
        uncertainty_reasons=["Extraordinary claim", "Very low confidence"]
    )
    
    ticket_id = await queue.request_review(review_req)
    
    # Check status (would be None initially)
    status = await queue.check_review_status(ticket_id)
    print(f"\nInitial status: {status}")
    
    # Simulate human providing feedback
    print("\n⏳ Simulating human review...")
    await asyncio.sleep(1)
    
    feedback = HumanFeedback(
        verdict="BS",
        confidence=99,
        reasoning="P=NP remains unsolved. This would be worldwide news."
    )
    
    await queue.submit_feedback(ticket_id, feedback)
    
    # Check status again
    final_feedback = await queue.check_review_status(ticket_id)
    print(f"\nFinal feedback: {final_feedback.verdict} ({final_feedback.confidence}%)")

# Run async demo
await demo_async_review()

In [None]:
import asyncio
from typing import Optional
import uuid
from datetime import datetime

class AsyncHumanReviewQueue:
    """Async queue for human review requests"""
    
    def __init__(self):
        self.pending_reviews = {}
        self.completed_reviews = {}
    
    async def request_review(self, review_request: HumanReviewRequest) -> str:
        """Submit a review request and get a ticket ID"""
        ticket_id = str(uuid.uuid4())
        self.pending_reviews[ticket_id] = {
            "request": review_request,
            "status": "pending",
            "submitted_at": datetime.now()
        }
        
        print(f"\n📋 Review Request Submitted")
        print(f"Ticket ID: {ticket_id}")
        print(f"Claim: {review_request.claim[:50]}...")
        print(f"Status: Pending human review")
        
        return ticket_id
    
    async def check_review_status(self, ticket_id: str) -> Optional[HumanFeedback]:
        """Check if review is complete"""
        if ticket_id in self.completed_reviews:
            return self.completed_reviews[ticket_id]["feedback"]
        elif ticket_id in self.pending_reviews:
            return None
        else:
            raise ValueError(f"Unknown ticket ID: {ticket_id}")
    
    async def submit_feedback(self, ticket_id: str, feedback: HumanFeedback):
        """Submit human feedback for a review request"""
        if ticket_id not in self.pending_reviews:
            raise ValueError(f"Unknown ticket ID: {ticket_id}")
        
        review = self.pending_reviews.pop(ticket_id)
        review["status"] = "completed"
        review["feedback"] = feedback
        review["completed_at"] = datetime.now()
        
        self.completed_reviews[ticket_id] = review
        print(f"✅ Review completed for ticket {ticket_id}")

# Demo async pattern
async def demo_async_review():
    queue = AsyncHumanReviewQueue()
    
    # Submit review request
    review_req = HumanReviewRequest(
        claim="Quantum computers solved P=NP yesterday",
        ai_verdict="BS",
        ai_confidence=30,
        uncertainty_reasons=["Extraordinary claim", "Very low confidence"]
    )
    
    ticket_id = await queue.request_review(review_req)
    
    # Check status (would be None initially)
    status = await queue.check_review_status(ticket_id)
    print(f"\nInitial status: {status}")
    
    # Simulate human providing feedback
    print("\n⏳ Simulating human review...")
    await asyncio.sleep(1)
    
    feedback = HumanFeedback(
        verdict="BS",
        confidence=99,
        reasoning="P=NP remains unsolved. This would be worldwide news."
    )
    
    await queue.submit_feedback(ticket_id, feedback)
    
    # Check status again
    final_feedback = await queue.check_review_status(ticket_id)
    print(f"\nFinal feedback: {final_feedback.verdict} ({final_feedback.confidence}%)")

# Run async demo
try:
    await demo_async_review()
except RuntimeError:
    # If not in async context, create one
    asyncio.run(demo_async_review())

In [None]:
import base64
from IPython.display import Image

# Mermaid diagram of the flow
mermaid_diagram = """
graph TD
    A[Claim Input] --> R{Router}
    
    R -->|Technical| TE[Technical Expert]
    R -->|Historical| HE[Historical Expert]
    R -->|Current Event| CE[Current Events Expert<br/>+ Web Search]
    R -->|General| GE[General Expert]
    
    TE --> UD{Uncertainty<br/>Detector}
    HE --> UD
    CE --> UD
    GE --> UD
    
    UD -->|High Uncertainty| HR[Human Review<br/>Request]
    UD -->|Low Uncertainty| FO[Format Output]
    
    HR --> HF[Human<br/>Feedback]
    HF --> FO
    
    FO --> V[Final Verdict]
    
    style HR fill:#ff9999
    style HF fill:#99ff99
    style UD fill:#ffcc99
"""

def render_mermaid_diagram(graph_def):
    graph_bytes = graph_def.encode("utf-8")
    base64_string = base64.b64encode(graph_bytes).decode("ascii")
    image_url = f"https://mermaid.ink/img/{base64_string}?type=png"
    return Image(url=image_url)

print("📊 Human-in-the-Loop Flow:")
render_mermaid_diagram(mermaid_diagram)

## 8. Production Considerations

### Key Design Decisions:

1. **When to Ask for Help**
   - Confidence thresholds
   - Expert disagreement
   - Claim categories
   - Business rules

2. **User Experience**
   - Clear presentation of uncertainty
   - Relevant context provided
   - Easy feedback interface
   - Response time expectations

3. **Async Patterns**
   - Queue-based systems
   - Webhook notifications
   - Polling vs push
   - Timeout handling

4. **Feedback Loop**
   - Store human decisions
   - Learn from corrections
   - Improve uncertainty detection
   - Track reviewer accuracy

In [None]:
# Example metrics tracking
class HumanReviewMetrics:
    """Track metrics for human review system"""
    
    def __init__(self):
        self.total_reviews = 0
        self.review_triggers = {}
        self.agreement_rate = 0
        self.avg_response_time = 0
    
    def track_review(self, request: HumanReviewRequest, feedback: HumanFeedback):
        self.total_reviews += 1
        
        # Track trigger reasons
        for reason in request.uncertainty_reasons:
            self.review_triggers[reason] = self.review_triggers.get(reason, 0) + 1
        
        # Track agreement
        if request.ai_verdict == feedback.verdict:
            self.agreement_rate = (
                (self.agreement_rate * (self.total_reviews - 1) + 1) 
                / self.total_reviews
            )
        else:
            self.agreement_rate = (
                (self.agreement_rate * (self.total_reviews - 1)) 
                / self.total_reviews
            )
    
    def get_summary(self):
        return {
            "total_reviews": self.total_reviews,
            "agreement_rate": f"{self.agreement_rate:.1%}",
            "top_triggers": sorted(
                self.review_triggers.items(), 
                key=lambda x: x[1], 
                reverse=True
            )[:3]
        }

# Demo metrics
metrics = HumanReviewMetrics()

# Simulate some reviews
test_reviews = [
    (HumanReviewRequest(
        claim="Test 1",
        ai_verdict="BS",
        uncertainty_reasons=["Low confidence: 40%"]
    ), HumanFeedback(verdict="BS", confidence=90, reasoning="Confirmed BS")),
    
    (HumanReviewRequest(
        claim="Test 2",
        ai_verdict="LEGITIMATE",
        uncertainty_reasons=["Expert disagreement", "Low confidence: 45%"]
    ), HumanFeedback(verdict="BS", confidence=85, reasoning="Actually false")),
]

for req, feedback in test_reviews:
    metrics.track_review(req, feedback)

print("📊 Human Review Metrics:")
summary = metrics.get_summary()
print(f"Total Reviews: {summary['total_reviews']}")
print(f"AI-Human Agreement: {summary['agreement_rate']}")
print("\nTop Review Triggers:")
for trigger, count in summary['top_triggers']:
    print(f"  - {trigger}: {count}")

## Summary

### What We Built:
1. **Uncertainty Detection** - Multi-factor scoring system
2. **Human Review Requests** - Clear, informative format
3. **Feedback Integration** - Structured human input
4. **Async Patterns** - Non-blocking review queue

### Key Takeaways:
- 🎯 **Know When to Ask** - Clear criteria for human review
- 📊 **Provide Context** - Give humans all relevant information  
- ⚡ **Don't Block** - Async patterns for production systems
- 📈 **Track & Learn** - Metrics to improve over time

### Benefits:
- ✅ Handles edge cases gracefully
- ✅ Builds user trust
- ✅ Enables continuous improvement
- ✅ Meets compliance requirements

### Next Steps:
In Iteration 6, we'll add **Memory** so our system can:
- Remember past human feedback
- Learn from previous decisions
- Build a knowledge base over time
- Reduce repeated human reviews