# Iteration 5: Human-in-the-Loop (Simplified v2)

This notebook demonstrates a simplified human-in-the-loop approach that:
- Uses conditional routing instead of complex interrupts
- Simulates human review for demo purposes
- Can be easily extended for production use

## The Concept: Simple Human Review Node

Instead of using LangGraph's interrupt functionality (which requires complex state management), we use a simpler approach:

1. **Review Check Node** - Determines if human review is needed
2. **Conditional Routing** - Routes to human review or output
3. **Simulated Review** - For demos, simulates human decisions
4. **Production Ready** - Easy to replace simulation with real review system

This approach is:
- ✅ Simple to understand and implement
- ✅ Easy to test and debug
- ✅ Straightforward to extend for production
- ✅ No complex state management needed

In [8]:
import sys
sys.path.append('..')

from modules.m5_human_in_loop_v2 import (
    HumanInLoopState,
    check_claim_with_human_review_v2,
    create_human_in_loop_graph_v2
)
from langgraph.graph import StateGraph
import warnings
warnings.filterwarnings('ignore')

print("✅ Imports successful!")

✅ Imports successful!


In [9]:
# Visualize the graph structure
import base64
from IPython.display import Image

app = create_human_in_loop_graph_v2()
mermaid_code = app.get_graph().draw_mermaid()

# Render the graph
graph_bytes = mermaid_code.encode("utf-8")
base64_string = base64.b64encode(graph_bytes).decode("ascii")
image_url = f"https://mermaid.ink/img/{base64_string}?type=png"

print("📊 Human-in-the-Loop Graph Structure:")
Image(url=image_url)

📊 Human-in-the-Loop Graph Structure:


In [10]:
# Test claims with different confidence levels
test_scenarios = [
    {
        "claim": "The Boeing 747 has four engines",
        "expected": "High confidence, no review needed"
    },
    {
        "claim": "A startup just invented teleportation technology",
        "expected": "Low confidence, should trigger review"
    },
    {
        "claim": "SpaceX announced a Mars mission for next week",
        "expected": "Current event with uncertainty"
    }
]

print("🧪 Testing Human-in-the-Loop Scenarios\n")

for i, scenario in enumerate(test_scenarios):
    print(f"\n{'='*70}")
    print(f"Scenario {i+1}: {scenario['expected']}")
    print(f"Claim: \"{scenario['claim']}\"")
    print("="*70)
    
    # Run the claim through the system
    result = check_claim_with_human_review_v2(scenario['claim'])
    
    print(f"\n✅ Final Decision:")
    print(f"Verdict: {result['verdict']} ({result['confidence']}%)")
    print(f"Reasoning: {result['reasoning'][:100]}...")
    print(f"Human Reviewed: {result.get('human_reviewed', False)}")
    
    if result.get('human_review_reason'):
        print(f"Review Reason: {result['human_review_reason']}")

🧪 Testing Human-in-the-Loop Scenarios


Scenario 1: High confidence, no review needed
Claim: "The Boeing 747 has four engines"


APIConnectionError: Connection error.

## 2. Testing Different Scenarios

## 3. Interactive Human Review

Let's create an interactive example where you can provide the human review:

In [4]:
# Example: Skip human review for testing
print("🔧 Testing with human review disabled\n")

test_claim = "A new AI system claims to be sentient"

# Run without human review (for testing)
result_no_review = check_claim_with_human_review_v2(test_claim, skip_human_review=True)

print(f"Claim: \"{test_claim}\"")
print(f"\nResult (no human review):")
print(f"Verdict: {result_no_review['verdict']} ({result_no_review['confidence']}%)")
print(f"Human Reviewed: {result_no_review.get('human_reviewed', False)}")

print("\n" + "-"*60 + "\n")

# Run with human review enabled (default)
result_with_review = check_claim_with_human_review_v2(test_claim, skip_human_review=False)

print(f"Result (with human review if needed):")
print(f"Verdict: {result_with_review['verdict']} ({result_with_review['confidence']}%)")
print(f"Human Reviewed: {result_with_review.get('human_reviewed', False)}")
if result_with_review.get('human_review_reason'):
    print(f"Review Reason: {result_with_review['human_review_reason']}")

🔧 Testing with human review disabled



  with DDGS() as ddgs:


Claim: "A new AI system claims to be sentient"

Result (no human review):
Verdict: BS (90%)
Human Reviewed: False

------------------------------------------------------------



  with DDGS() as ddgs:


Result (with human review if needed):
Verdict: BS (90%)
Human Reviewed: False


In [5]:
# Example: Skip human review for testing
print("🔧 Testing with human review disabled\n")

test_claim = "A new AI system claims to be sentient"

# Run without human review (for testing)
result_no_review = check_claim_with_human_review_v2(test_claim, skip_human_review=True)

print(f"Claim: \"{test_claim}\"")
print(f"\nResult (no human review):")
print(f"Verdict: {result_no_review['verdict']} ({result_no_review['confidence']}%)")
print(f"Human Reviewed: {result_no_review.get('human_reviewed', False)}")

print("\n" + "-"*60 + "\n")

# Run with human review enabled (default)
result_with_review = check_claim_with_human_review_v2(test_claim, skip_human_review=False)

print(f"Result (with human review if needed):")
print(f"Verdict: {result_with_review['verdict']} ({result_with_review['confidence']}%)")
print(f"Human Reviewed: {result_with_review.get('human_reviewed', False)}")
if result_with_review.get('human_review_reason'):
    print(f"Review Reason: {result_with_review['human_review_reason']}")

🔧 Testing with human review disabled



  with DDGS() as ddgs:


Claim: "A new AI system claims to be sentient"

Result (no human review):
Verdict: BS (90%)
Human Reviewed: False

------------------------------------------------------------



  with DDGS() as ddgs:


Result (with human review if needed):
Verdict: BS (90%)
Human Reviewed: False


In [6]:
# Example: Monitoring human review metrics
class ReviewMetrics:
    def __init__(self):
        self.total_claims = 0
        self.reviews_triggered = 0
        self.avg_confidence_before = 0
        self.avg_confidence_after = 0
        self.review_reasons = {}
    
    def track_claim(self, needs_review: bool, reason: str = None, 
                   ai_confidence: int = None, human_confidence: int = None):
        self.total_claims += 1
        
        if needs_review:
            self.reviews_triggered += 1
            if reason:
                self.review_reasons[reason] = self.review_reasons.get(reason, 0) + 1
            
            if ai_confidence and human_confidence:
                # Update running averages
                n = self.reviews_triggered
                self.avg_confidence_before = (
                    (self.avg_confidence_before * (n-1) + ai_confidence) / n
                )
                self.avg_confidence_after = (
                    (self.avg_confidence_after * (n-1) + human_confidence) / n
                )
    
    def get_summary(self):
        review_rate = self.reviews_triggered / self.total_claims if self.total_claims else 0
        
        return {
            "total_claims": self.total_claims,
            "review_rate": f"{review_rate:.1%}",
            "avg_confidence_improvement": 
                f"{self.avg_confidence_before:.0f}% → {self.avg_confidence_after:.0f}%",
            "top_reasons": sorted(
                self.review_reasons.items(), 
                key=lambda x: x[1], 
                reverse=True
            )[:3]
        }

# Demo metrics
metrics = ReviewMetrics()

# Simulate some tracking
metrics.track_claim(needs_review=False)
metrics.track_claim(needs_review=True, reason="Low confidence: 30%", 
                   ai_confidence=30, human_confidence=95)
metrics.track_claim(needs_review=True, reason="Current event uncertainty",
                   ai_confidence=45, human_confidence=80)
metrics.track_claim(needs_review=False)

print("📊 Human Review Metrics:")
summary = metrics.get_summary()
for key, value in summary.items():
    if key == "top_reasons":
        print(f"\nTop Review Reasons:")
        for reason, count in value:
            print(f"  - {reason}: {count}")
    else:
        print(f"{key}: {value}")

📊 Human Review Metrics:
total_claims: 4
review_rate: 50.0%
avg_confidence_improvement: 38% → 88%

Top Review Reasons:
  - Low confidence: 30%: 1
  - Current event uncertainty: 1


In [7]:
# Production pattern for human review
class ProductionHumanReviewSystem:
    """Production-ready human review system"""
    
    def __init__(self):
        self.pending_reviews = {}  # id -> review info
        self.completed_reviews = {}
        self.review_counter = 0
    
    def process_claim(self, claim: str) -> dict:
        """Process a claim and handle human review if needed"""
        # Run through the system
        result = check_claim_with_human_review_v2(claim)
        
        if result.get('human_reviewed') and result.get('human_review_reason'):
            # In production, this would queue for real human review
            self.review_counter += 1
            review_id = f"review_{self.review_counter}"
            
            self.pending_reviews[review_id] = {
                "claim": claim,
                "ai_verdict": result['verdict'],
                "ai_confidence": result['confidence'],
                "reason": result['human_review_reason'],
                "created_at": datetime.now()
            }
            
            return {
                "status": "pending_human_review",
                "review_id": review_id,
                "preliminary_result": result
            }
        
        return {
            "status": "completed",
            "result": result
        }
    
    def get_pending_reviews(self) -> list:
        """Get all pending reviews"""
        return [
            {
                "review_id": rid,
                "claim": info["claim"][:50] + "...",
                "reason": info["reason"],
                "waiting_time": (datetime.now() - info["created_at"]).seconds
            }
            for rid, info in self.pending_reviews.items()
        ]

# Demo the production pattern
from datetime import datetime

system = ProductionHumanReviewSystem()

# Test various claims
test_claims = [
    "The Boeing 747 has four engines",  # High confidence
    "Scientists created a perpetual motion machine",  # Should trigger review
    "A new quantum computer solved climate change",  # Should trigger review
]

print("🏭 Production Human Review System Demo\n")

for claim in test_claims:
    print(f"\nProcessing: \"{claim}\"")
    result = system.process_claim(claim)
    print(f"Status: {result['status']}")
    
    if result['status'] == 'pending_human_review':
        print(f"Review ID: {result['review_id']}")
        print(f"Reason: {result['preliminary_result']['human_review_reason']}")

# Show pending reviews
pending = system.get_pending_reviews()
if pending:
    print(f"\n\n📋 Pending Human Reviews: {len(pending)}")
    for review in pending:
        print(f"\n{review['review_id']}:")
        print(f"  Claim: {review['claim']}")
        print(f"  Reason: {review['reason']}")
        print(f"  Waiting: {review['waiting_time']}s")

🏭 Production Human Review System Demo


Processing: "The Boeing 747 has four engines"
Status: completed

Processing: "Scientists created a perpetual motion machine"
Status: completed

Processing: "A new quantum computer solved climate change"


  with DDGS() as ddgs:


Status: completed


## Summary

### What We Built:
✅ **Simple Implementation** - Conditional routing to human review node  
✅ **Demo Ready** - Simulated human review for testing  
✅ **Production Pattern** - Easy to replace simulation with real system  
✅ **Flexible Control** - Can skip review for testing  

### Key Advantages:
- 🎯 **Simple Code** - No complex state management
- 🔄 **Easy Testing** - Can simulate or skip review
- ⚡ **Straightforward** - Clear flow through the graph
- 📊 **Observable** - Easy to add metrics and monitoring

### Production Implementation:
To use in production, replace the `simulate_human_review_node` with:
1. Queue the review request
2. Return a pending status
3. Have a separate process handle human reviews
4. Update results asynchronously

### Next Steps:
In Iteration 6, we'll add **Memory/Persistence** to:
- Remember past decisions
- Learn from human feedback
- Build a knowledge base
- Reduce repeated reviews