# Hallucination Detection Demo

Functional demos for hallucination detection in LLM outputs:

1. **LettuceDetect** - Token-level detection with ModernBERT
2. **NLI-Based Detection** - Natural Language Inference for contradiction detection
3. **Self-Consistency** - Multi-sample agreement check
4. **HaluGate-Style Pipeline** - Sentinel → Detector → Explainer
5. **Grounded Prompting** - Mitigation via prompt engineering

## Hallucination Types

- **Intrinsic**: Contradicts provided context
- **Extrinsic**: Fabricates information not in context

## Setup

```bash
pip install lettucedetect transformers torch
```

**Ollama:** `ollama pull qwen3:4b`


In [17]:
# Setup: Logging, environment checks, and Ollama client
import subprocess
import logging
import os
import json
import requests
import re
from typing import List, Dict, Any, Tuple

# Color-coded logging
class ColoredFormatter(logging.Formatter):
    COLORS = {'DEBUG': '\033[90m', 'INFO': '\033[92m', 'WARNING': '\033[93m', 'ERROR': '\033[91m', 'RESET': '\033[0m'}
    def format(self, record):
        color = self.COLORS.get(record.levelname, self.COLORS['RESET'])
        record.msg = f"{color}[{record.levelname}]{self.COLORS['RESET']} {record.msg}"
        return super().format(record)

logger = logging.getLogger("hallucination_demo")
logger.setLevel(logging.DEBUG)
if not logger.handlers:
    handler = logging.StreamHandler()
    handler.setFormatter(ColoredFormatter('%(message)s'))
    logger.addHandler(handler)

# Check Ollama
def check_ollama():
    try:
        result = subprocess.run(["ollama", "list"], capture_output=True, text=True, timeout=5)
        if result.returncode == 0:
            logger.info("Ollama is running")
            print(f"\nAvailable models:\n{result.stdout}")
            return True
    except Exception as e:
        logger.error(f"Ollama check failed: {e}")
    return False

ollama_ready = check_ollama()
MODEL = "qwen3:4b"
OLLAMA_URL = "http://localhost:11434"

if ollama_ready:
    logger.info(f"Using model: {MODEL}")

# Ollama generate function
def ollama_generate(prompt: str, model: str = MODEL, temperature: float = 0.7) -> str:
    """Generate response from Ollama."""
    response = requests.post(
        f"{OLLAMA_URL}/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "stream": False,
            "options": {"temperature": temperature}
        },
        timeout=120
    )
    return response.json().get("response", "")

def ollama_chat(messages: List[Dict], model: str = MODEL, temperature: float = 0.7) -> str:
    """Chat completion from Ollama."""
    response = requests.post(
        f"{OLLAMA_URL}/api/chat",
        json={
            "model": model,
            "messages": messages,
            "stream": False,
            "options": {"temperature": temperature}
        },
        timeout=120
    )
    return response.json().get("message", {}).get("content", "")

def clean_response(text: str) -> str:
    """Remove thinking tags from qwen3 responses."""
    return re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL).strip()

logger.info("Helper functions ready")


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[92m[INFO][0m Ollama is running
[92m[INFO][0m Using model: qwen3:4b
[92m[INFO][0m Helper functions ready



Available models:
NAME           ID              SIZE      MODIFIED     
qwen3:4b       359d7dd4bcda    2.5 GB    44 hours ago    
llama3.2:1b    baf6a787fdff    1.3 GB    44 hours ago    



---

# Demo 1: LettuceDetect (Token-Level Detection)

**LettuceDetect** uses ModernBERT for token-level hallucination detection in RAG pipelines.

**Key features:**
- Fast: Runs at inference time (~50-100ms)
- Token-level: Flags specific unsupported tokens
- ModernBERT-based: Supports up to 8,192 tokens
- Local execution: No external API calls

**Install:** `pip install lettucedetect`


In [18]:
# LettuceDetect: Setup and initialization

print("="*65)
print("LETTUCEDETECT: Token-Level Hallucination Detection")
print("="*65)

lettucedetect_available = False
detector = None

try:
    from lettucedetect.models.inference import HallucinationDetector
    
    # Initialize the detector with ModernBERT model
    # Model: KRLabsOrg/lettuce-detect-base-modernbert-en-v1
    print("\nLoading LettuceDetect model...")
    print("(First run downloads ~500MB)")
    
    detector = HallucinationDetector(
        model_path="KRLabsOrg/lettuce-detect-base-modernbert-en-v1",
        model_type="token"
    )
    
    lettucedetect_available = True
    logger.info("LettuceDetect loaded successfully")
    print("\n✓ Model: KRLabsOrg/lettuce-detect-base-modernbert-en-v1")
    print("✓ Type: Token-level classification")
    print("✓ Context window: 8,192 tokens (ModernBERT)")
    
except ImportError as e:
    logger.warning(f"LettuceDetect not installed: {e}")
    print("\n✗ LettuceDetect not available")
    print("\nTo install:")
    print("  pip install lettucedetect")
    
except Exception as e:
    logger.error(f"LettuceDetect initialization failed: {e}")
    print(f"\n✗ Error: {e}")
    print("\nNote: First run downloads the model (~500MB)")


LETTUCEDETECT: Token-Level Hallucination Detection

Loading LettuceDetect model...
(First run downloads ~500MB)


[91m[ERROR][0m LettuceDetect initialization failed: KRLabsOrg/lettuce-detect-base-modernbert-en-v1 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `hf auth login` or by passing `token=<your_token>`



✗ Error: KRLabsOrg/lettuce-detect-base-modernbert-en-v1 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `hf auth login` or by passing `token=<your_token>`

Note: First run downloads the model (~500MB)


In [19]:
# LettuceDetect: Demo with RAG-style hallucination detection

print("="*65)
print("LETTUCEDETECT: RAG Hallucination Detection Demo")
print("="*65)

# Test cases: context + question + answer with potential hallucinations
test_cases = [
    {
        "name": "Faithful Response (No Hallucination)",
        "context": ["The Eiffel Tower is located in Paris, France. It was completed in 1889 and stands 330 meters tall."],
        "question": "Where is the Eiffel Tower located?",
        "answer": "The Eiffel Tower is located in Paris, France."
    },
    {
        "name": "Intrinsic Hallucination (Contradicts Context)",
        "context": ["The Eiffel Tower is located in Paris, France. It was completed in 1889."],
        "question": "Where is the Eiffel Tower located?",
        "answer": "The Eiffel Tower is located in Berlin, Germany."
    },
    {
        "name": "Extrinsic Hallucination (Fabricated Info)",
        "context": ["The Eiffel Tower is located in Paris, France."],
        "question": "Tell me about the Eiffel Tower.",
        "answer": "The Eiffel Tower is in Paris. It was designed by Alexander Graham Bell in 1920."
    },
    {
        "name": "Partial Hallucination",
        "context": ["Apple Inc. was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976."],
        "question": "Who founded Apple?",
        "answer": "Apple was founded by Steve Jobs and Bill Gates in 1976."
    }
]

if not lettucedetect_available:
    print("\n⚠ LettuceDetect not available - showing conceptual demo")
    print("""
CONCEPTUAL USAGE:

    from lettucedetect.models.inference import HallucinationDetector
    
    detector = HallucinationDetector(
        model_path="KRLabsOrg/lettuce-detect-base-modernbert-en-v1",
        model_type="token"
    )
    
    # Detect hallucinations - returns list of span predictions
    predictions = detector.predict(
        context=["Paris is the capital of France."],
        question="What is the capital of France?",
        answer="The capital of France is Lyon."
    )
    
    # Each prediction has: start, end, label, confidence
    for pred in predictions:
        if pred.label == "hallucinated":
            print(f"Hallucinated span: {answer[pred.start:pred.end]}")
    """)
else:
    for case in test_cases:
        print(f"\n{'─'*65}")
        print(f"[{case['name']}]")
        print(f"{'─'*65}")
        ctx_display = case['context'][0][:80] + "..." if len(case['context'][0]) > 80 else case['context'][0]
        print(f"Context: {ctx_display}")
        print(f"Question: {case['question']}")
        print(f"Answer: {case['answer']}")
        
        try:
            # Get token-level predictions
            predictions = detector.predict(
                context=case['context'],
                question=case['question'],
                answer=case['answer']
            )
            
            # Analyze predictions - predictions is a list of spans with labels
            hallucinated_spans = []
            for pred in predictions:
                if hasattr(pred, 'label') and pred.label == 'hallucinated':
                    # Extract the hallucinated text from the answer
                    span_text = case['answer'][pred.start:pred.end] if hasattr(pred, 'start') else str(pred)
                    hallucinated_spans.append(span_text)
            
            if hallucinated_spans:
                print(f"\n⚠ HALLUCINATION DETECTED")
                print(f"  Flagged spans: {hallucinated_spans}")
            else:
                print(f"\n✓ No hallucination detected")
                
        except Exception as e:
            logger.error(f"Detection failed: {e}")
            print(f"\n✗ Error: {e}")


LETTUCEDETECT: RAG Hallucination Detection Demo

⚠ LettuceDetect not available - showing conceptual demo

CONCEPTUAL USAGE:

    from lettucedetect.models.inference import HallucinationDetector

    detector = HallucinationDetector(
        model_path="KRLabsOrg/lettuce-detect-base-modernbert-en-v1",
        model_type="token"
    )

    # Detect hallucinations - returns list of span predictions
    predictions = detector.predict(
        context=["Paris is the capital of France."],
        question="What is the capital of France?",
        answer="The capital of France is Lyon."
    )

    # Each prediction has: start, end, label, confidence
    for pred in predictions:
        if pred.label == "hallucinated":
            print(f"Hallucinated span: {answer[pred.start:pred.end]}")
    


In [20]:
# LettuceDetect: Live demo with LLM-generated responses

print("="*65)
print("LETTUCEDETECT: Live Demo with Ollama")
print("="*65)

if not ollama_ready:
    print("\n⚠ Ollama not running - skipping live demo")
elif not lettucedetect_available:
    print("\n⚠ LettuceDetect not available - skipping live demo")
else:
    # RAG-style context and questions
    rag_scenarios = [
        {
            "name": "Technical Documentation",
            "context": """TechCorp Cloud offers three pricing tiers: 
            - Basic: $10/month, 100GB storage, 5 users
            - Pro: $50/month, 1TB storage, 25 users  
            - Enterprise: Custom pricing, unlimited storage, unlimited users
            All plans include 24/7 email support. Phone support is only available for Enterprise.""",
            "question": "What's included in the Pro plan?"
        },
        {
            "name": "Historical Facts",
            "context": """The Apollo 11 mission landed on the Moon on July 20, 1969. 
            Neil Armstrong was the first human to walk on the Moon, followed by Buzz Aldrin. 
            Michael Collins remained in lunar orbit aboard the command module.""",
            "question": "Who walked on the Moon during Apollo 11?"
        }
    ]
    
    for scenario in rag_scenarios:
        print(f"\n{'─'*65}")
        print(f"[{scenario['name']}]")
        print(f"{'─'*65}")
        print(f"Context: {scenario['context'][:100]}...")
        print(f"Question: {scenario['question']}")
        
        # Generate response with Ollama
        prompt = f"""Based ONLY on the following context, answer the question.
        
Context: {scenario['context']}

Question: {scenario['question']}

Answer concisely based only on the context:"""
        
        print("\nGenerating response...")
        response = ollama_generate(prompt, temperature=0.3)
        response = clean_response(response)
        
        resp_display = response[:200] + "..." if len(response) > 200 else response
        print(f"Response: {resp_display}")
        
        # Detect hallucinations
        print("\nRunning hallucination detection...")
        try:
            predictions = detector.predict(
                context=[scenario['context']],
                question=scenario['question'],
                answer=response
            )
            
            # Check for hallucinations in predictions
            has_hallucination = False
            flagged = []
            for pred in predictions:
                if hasattr(pred, 'label') and pred.label == 'hallucinated':
                    has_hallucination = True
                    span_text = response[pred.start:pred.end] if hasattr(pred, 'start') else str(pred)
                    flagged.append(span_text)
            
            if has_hallucination:
                print(f"⚠ HALLUCINATION DETECTED: {flagged}")
            else:
                print("✓ Response appears grounded in context")
                
        except Exception as e:
            print(f"Detection error: {e}")


LETTUCEDETECT: Live Demo with Ollama

⚠ LettuceDetect not available - skipping live demo


---

# Demo 2: NLI-Based Hallucination Detection

Natural Language Inference (NLI) models classify text pairs as:
- **Entailment**: Response follows from context
- **Contradiction**: Response contradicts context (hallucination!)
- **Neutral**: Neither follows nor contradicts

**Pros:** Simple, well-understood, catches intrinsic hallucinations
**Cons:** Sentence-level (not token-level), may miss subtle issues

**Models:** `cross-encoder/nli-deberta-v3-base`, `facebook/bart-large-mnli`


In [21]:
# NLI-Based Detection: Setup

print("="*65)
print("NLI-BASED HALLUCINATION DETECTION: Setup")
print("="*65)

nli_available = False
nli_pipeline = None

try:
    from transformers import pipeline
    import torch
    
    # Use a smaller, faster NLI model for demo
    # Options: cross-encoder/nli-deberta-v3-base, facebook/bart-large-mnli
    NLI_MODEL = "cross-encoder/nli-deberta-v3-base"
    
    print(f"\nLoading NLI model: {NLI_MODEL}...")
    print("(First run downloads model ~300MB)")
    
    # Determine device
    if torch.backends.mps.is_available():
        device = "mps"
    elif torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"
    
    nli_pipeline = pipeline(
        "text-classification",
        model=NLI_MODEL,
        device=device
    )
    
    nli_available = True
    logger.info("NLI model loaded")
    print(f"\n✓ Model loaded: {NLI_MODEL}")
    print(f"✓ Device: {device.upper()}")
    
except ImportError as e:
    logger.warning(f"Transformers not installed: {e}")
    print("\n✗ Transformers not available")
    print("\nTo install:")
    print("  pip install transformers torch")
    
except Exception as e:
    logger.error(f"NLI setup failed: {e}")
    print(f"\n✗ Error: {e}")


NLI-BASED HALLUCINATION DETECTION: Setup

Loading NLI model: cross-encoder/nli-deberta-v3-base...
(First run downloads model ~300MB)


Device set to use mps
[92m[INFO][0m NLI model loaded



✓ Model loaded: cross-encoder/nli-deberta-v3-base
✓ Device: MPS


In [22]:
# NLI-Based Detection: Demo

print("="*65)
print("NLI-BASED DETECTION: Contradiction Analysis")
print("="*65)

def extract_claims(text: str) -> List[str]:
    """Split text into claim sentences."""
    sentences = re.split(r'[.!?]+', text)
    return [s.strip() for s in sentences if len(s.strip()) > 10]

def nli_detect_hallucination(context: str, response: str, threshold: float = 0.7) -> Dict[str, Any]:
    """Detect hallucinations using NLI contradiction detection."""
    if not nli_available:
        return {"error": "NLI not available"}
    
    claims = extract_claims(response)
    results = []
    
    for claim in claims:
        # NLI input format for cross-encoder models
        prediction = nli_pipeline(f"{context}</s></s>{claim}")[0]
        
        results.append({
            "claim": claim,
            "label": prediction["label"],
            "score": prediction["score"]
        })
    
    # Find contradictions
    contradictions = [r for r in results if r["label"] == "contradiction" and r["score"] >= threshold]
    
    return {
        "has_hallucination": len(contradictions) > 0,
        "contradictions": contradictions,
        "all_claims": results
    }

# Test cases
nli_test_cases = [
    {
        "name": "Faithful Response",
        "context": "The capital of France is Paris. Paris has a population of about 2 million people.",
        "response": "Paris is the capital of France with approximately 2 million residents."
    },
    {
        "name": "Contradiction (Intrinsic Hallucination)",
        "context": "The capital of France is Paris. Paris has a population of about 2 million people.",
        "response": "London is the capital of France. It has a population of 10 million."
    },
    {
        "name": "Mixed: Correct + Fabricated",
        "context": "Python was created by Guido van Rossum and first released in 1991.",
        "response": "Python was created by Guido van Rossum. It was developed at MIT in 1985."
    }
]

if nli_available:
    for case in nli_test_cases:
        print(f"\n{'─'*65}")
        print(f"[{case['name']}]")
        print(f"{'─'*65}")
        print(f"Context: {case['context']}")
        print(f"Response: {case['response']}")
        
        result = nli_detect_hallucination(case['context'], case['response'])
        
        print(f"\nClaim Analysis:")
        for claim_result in result['all_claims']:
            symbol = "⚠" if claim_result['label'] == 'contradiction' else "✓" if claim_result['label'] == 'entailment' else "○"
            claim_display = claim_result['claim'][:50] + "..." if len(claim_result['claim']) > 50 else claim_result['claim']
            print(f"  {symbol} [{claim_result['label']:12}] ({claim_result['score']:.2f}) \"{claim_display}\"")
        
        if result['has_hallucination']:
            print(f"\n⚠ HALLUCINATION DETECTED: {len(result['contradictions'])} contradicting claim(s)")
        else:
            print(f"\n✓ No contradictions detected")
else:
    print("\n⚠ NLI model not available - showing conceptual usage")
    print("""
    from transformers import pipeline
    
    nli = pipeline("text-classification", model="cross-encoder/nli-deberta-v3-base")
    
    # Format: premise </s></s> hypothesis
    result = nli("Paris is in France.</s></s>Paris is in Germany.")
    # Returns: {'label': 'contradiction', 'score': 0.98}
    """)


NLI-BASED DETECTION: Contradiction Analysis

─────────────────────────────────────────────────────────────────
[Faithful Response]
─────────────────────────────────────────────────────────────────
Context: The capital of France is Paris. Paris has a population of about 2 million people.
Response: Paris is the capital of France with approximately 2 million residents.

Claim Analysis:
  ✓ [entailment  ] (1.00) "Paris is the capital of France with approximately ..."

✓ No contradictions detected

─────────────────────────────────────────────────────────────────
[Contradiction (Intrinsic Hallucination)]
─────────────────────────────────────────────────────────────────
Context: The capital of France is Paris. Paris has a population of about 2 million people.
Response: London is the capital of France. It has a population of 10 million.

Claim Analysis:
  ⚠ [contradiction] (1.00) "London is the capital of France"
  ⚠ [contradiction] (1.00) "It has a population of 10 million"

⚠ HALLUCINATION 

In [23]:
# NLI-Based Detection: Live demo with Ollama

print("="*65)
print("NLI DETECTION: Live Demo with LLM-Generated Responses")
print("="*65)

if not ollama_ready:
    print("\n⚠ Ollama not running - skipping live demo")
elif not nli_available:
    print("\n⚠ NLI model not available - skipping live demo")
else:
    # Test with a prompt likely to cause hallucination
    context = """TechCorp was founded in 2015 by Jane Smith in Austin, Texas. 
    The company specializes in cloud security software. 
    As of 2024, TechCorp has 500 employees and annual revenue of $50 million."""
    
    # Question that might tempt the LLM to hallucinate
    question = "Tell me everything about TechCorp including its founding, products, and any acquisitions."
    
    prompt = f"""Answer the question based on the context. Only include information from the context.

Context: {context}

Question: {question}

Answer:"""
    
    print(f"Context: {context[:100]}...")
    print(f"Question: {question}")
    print("\nGenerating response...")
    
    response = ollama_generate(prompt, temperature=0.7)
    response = clean_response(response)
    
    print(f"\nResponse: {response}")
    
    # Run NLI detection
    print(f"\n{'─'*65}")
    print("NLI Analysis:")
    print(f"{'─'*65}")
    
    result = nli_detect_hallucination(context, response)
    
    for claim_result in result['all_claims']:
        symbol = "⚠" if claim_result['label'] == 'contradiction' else "✓" if claim_result['label'] == 'entailment' else "○"
        claim_display = claim_result['claim'][:60] + "..." if len(claim_result['claim']) > 60 else claim_result['claim']
        print(f"  {symbol} [{claim_result['label']:12}] {claim_display}")
    
    if result['has_hallucination']:
        print(f"\n⚠ POTENTIAL HALLUCINATIONS: {len(result['contradictions'])} contradicting claim(s)")
    else:
        print(f"\n✓ Response appears consistent with context")
        print("  Note: NLI catches contradictions but may miss fabricated details")


NLI DETECTION: Live Demo with LLM-Generated Responses
Context: TechCorp was founded in 2015 by Jane Smith in Austin, Texas. 
    The company specializes in cloud s...
Question: Tell me everything about TechCorp including its founding, products, and any acquisitions.

Generating response...

Response: Based solely on the provided context, here is everything about TechCorp:

- TechCorp was founded in 2015 by Jane Smith in Austin, Texas.
- The company specializes in cloud security software.
- As of 2024, TechCorp has 500 employees and annual revenue of $50 million.
- The context does not mention any specific products beyond the specialization in cloud security software.
- The context does not mention any acquisitions.

─────────────────────────────────────────────────────────────────
NLI Analysis:
─────────────────────────────────────────────────────────────────
  ✓ [entailment  ] Based solely on the provided context, here is everything abo...
  ✓ [entailment  ] - The company specializes 

---

# Demo 3: Self-Consistency Check

Generate multiple responses and check agreement on factual claims.

**Principle:** If the model is uncertain/hallucinating, multiple samples with temperature > 0 will disagree.

**Pros:** No additional models needed, cheap
**Cons:** Multiple LLM calls, only catches uncertainty-based hallucinations


In [24]:
# Self-Consistency: Implementation and Demo

print("="*65)
print("SELF-CONSISTENCY: Multi-Sample Agreement Check")
print("="*65)

def self_consistency_check(
    prompt: str,
    n_samples: int = 3,
    temperature: float = 0.8
) -> Dict[str, Any]:
    """Generate multiple responses and check for consistency."""
    
    responses = []
    for i in range(n_samples):
        resp = ollama_generate(prompt, temperature=temperature)
        resp = clean_response(resp)
        responses.append(resp)
    
    # Simple consistency check: extract key facts and compare
    # In production, use embedding similarity or structured extraction
    
    # Extract numbers mentioned in each response
    def extract_numbers(text):
        return set(re.findall(r'\b\d+(?:\.\d+)?(?:,\d+)*\b', text))
    
    # Extract proper nouns (simplified)
    def extract_entities(text):
        words = text.split()
        return set(w for w in words if w and w[0].isupper() and len(w) > 2)
    
    all_numbers = [extract_numbers(r) for r in responses]
    all_entities = [extract_entities(r) for r in responses]
    
    # Check agreement
    if all(len(nums) > 0 for nums in all_numbers):
        number_agreement = len(set.intersection(*all_numbers)) / max(len(set.union(*all_numbers)), 1)
    else:
        number_agreement = 1.0  # No numbers to compare
        
    if all(len(ents) > 0 for ents in all_entities):
        entity_agreement = len(set.intersection(*all_entities)) / max(len(set.union(*all_entities)), 1)
    else:
        entity_agreement = 1.0  # No entities to compare
    
    overall_consistency = (number_agreement + entity_agreement) / 2
    
    return {
        "responses": responses,
        "consistency_score": overall_consistency,
        "number_agreement": number_agreement,
        "entity_agreement": entity_agreement,
        "common_numbers": set.intersection(*all_numbers) if all_numbers and all(all_numbers) else set(),
        "common_entities": set.intersection(*all_entities) if all_entities and all(all_entities) else set()
    }

if not ollama_ready:
    print("\n⚠ Ollama not running - showing conceptual demo")
    print("""
CONCEPTUAL USAGE:

    def self_consistency_check(prompt, n_samples=3, temperature=0.8):
        responses = [llm.generate(prompt, temperature=temperature) for _ in range(n_samples)]
        
        # Extract and compare key facts
        facts = [extract_facts(r) for r in responses]
        
        # Calculate agreement
        agreement = intersection(facts) / union(facts)
        
        return agreement < 0.5  # Low agreement suggests hallucination
    """)
else:
    # Test with a factual question
    context = "The Great Wall of China is approximately 21,196 kilometers long. Construction began in 7th century BC."
    prompt = f"""Based on this context, answer the question.
    
Context: {context}

Question: How long is the Great Wall of China and when was it built?

Answer briefly:"""
    
    print(f"\nContext: {context}")
    print("\nGenerating 3 samples with temperature=0.8...")
    
    result = self_consistency_check(prompt, n_samples=3, temperature=0.8)
    
    print(f"\n{'─'*65}")
    print("Responses:")
    for i, resp in enumerate(result['responses'], 1):
        resp_display = resp[:150] + "..." if len(resp) > 150 else resp
        print(f"\n  Sample {i}: {resp_display}")
    
    print(f"\n{'─'*65}")
    print("Consistency Analysis:")
    print(f"  Overall Score: {result['consistency_score']:.2f}")
    print(f"  Number Agreement: {result['number_agreement']:.2f}")
    print(f"  Entity Agreement: {result['entity_agreement']:.2f}")
    print(f"  Common Numbers: {result['common_numbers']}")
    common_ents = list(result['common_entities'])[:5]
    print(f"  Common Entities: {common_ents}{'...' if len(result['common_entities']) > 5 else ''}")
    
    if result['consistency_score'] < 0.5:
        print(f"\n⚠ LOW CONSISTENCY - Potential hallucination or uncertainty")
    else:
        print(f"\n✓ Responses are reasonably consistent")


SELF-CONSISTENCY: Multi-Sample Agreement Check

Context: The Great Wall of China is approximately 21,196 kilometers long. Construction began in 7th century BC.

Generating 3 samples with temperature=0.8...

─────────────────────────────────────────────────────────────────
Responses:

  Sample 1: Approximately 21,196 kilometers long; construction began in the 7th century BC.

  Sample 2: About 21,196 kilometers long; construction began in the 7th century BC.

  Sample 3: Approximately 21,196 kilometers long; construction began in the 7th century BC.

─────────────────────────────────────────────────────────────────
Consistency Analysis:
  Overall Score: 0.67
  Number Agreement: 1.00
  Entity Agreement: 0.33
  Common Numbers: {'21,196'}
  Common Entities: ['BC.']

✓ Responses are reasonably consistent


---

# Comparison Summary


In [25]:
# Hallucination Detection: Comparison Summary (Demos 1-3)

print("="*70)
print("HALLUCINATION DETECTION: Method Comparison (Demos 1-3)")
print("="*70)
print("""
┌─────────────────────┬──────────────┬───────────────┬─────────────────────┐
│ Method              │ Speed        │ Granularity   │ Best For            │
├─────────────────────┼──────────────┼───────────────┼─────────────────────┤
│ LettuceDetect       │ ~50-100ms    │ Token-level   │ RAG, real-time      │
│ NLI-Based           │ ~20-50ms     │ Sentence      │ Contradiction check │
│ Self-Consistency    │ N × LLM call │ Response      │ Uncertainty detect  │
│ LLM-as-Judge        │ ~1-3s        │ Semantic      │ High-stakes, audit  │
└─────────────────────┴──────────────┴───────────────┴─────────────────────┘

DETECTION CAPABILITIES:

  Intrinsic (Contradicts Context):
    ✓ LettuceDetect - Excellent (token-level)
    ✓ NLI-Based     - Good (sentence-level)
    ○ Self-Consistency - Limited
    
  Extrinsic (Fabricated Info):
    ✓ LettuceDetect - Good (flags unsupported tokens)
    ○ NLI-Based     - Limited (neutral ≠ false)
    ○ Self-Consistency - Catches uncertainty

→ See Demo 4 for HaluGate (vLLM's 3-stage pipeline)
""")

print("─"*70)
print("DEMO 1-3 STATUS")
print("─"*70)
status = [
    ("Ollama (qwen3:4b)", "✓" if ollama_ready else "✗"),
    ("LettuceDetect", "✓" if lettucedetect_available else "✗ pip install lettucedetect"),
    ("NLI Model", "✓" if nli_available else "✗ pip install transformers torch"),
]
for name, stat in status:
    print(f"  {stat} {name}")


HALLUCINATION DETECTION: Method Comparison (Demos 1-3)

┌─────────────────────┬──────────────┬───────────────┬─────────────────────┐
│ Method              │ Speed        │ Granularity   │ Best For            │
├─────────────────────┼──────────────┼───────────────┼─────────────────────┤
│ LettuceDetect       │ ~50-100ms    │ Token-level   │ RAG, real-time      │
│ NLI-Based           │ ~20-50ms     │ Sentence      │ Contradiction check │
│ Self-Consistency    │ N × LLM call │ Response      │ Uncertainty detect  │
│ LLM-as-Judge        │ ~1-3s        │ Semantic      │ High-stakes, audit  │
└─────────────────────┴──────────────┴───────────────┴─────────────────────┘

DETECTION CAPABILITIES:

  Intrinsic (Contradicts Context):
    ✓ LettuceDetect - Excellent (token-level)
    ✓ NLI-Based     - Good (sentence-level)
    ○ Self-Consistency - Limited

  Extrinsic (Fabricated Info):
    ✓ LettuceDetect - Good (flags unsupported tokens)
    ○ NLI-Based     - Limited (neutral ≠ false)
    ○ Self

---

# Demo 4: HaluGate-Style Pipeline

A 3-stage pipeline for hallucination detection without LLM-as-judge:

| Stage | Model | Function |
|-------|-------|----------|
| 1 | **Sentinel** | Classifies if prompt needs fact-checking |
| 2 | **Detector** | Claim-level hallucination detection (NLI) |
| 3 | **Explainer** | NLI explanation (contradiction/neutral/entailment) |

**Models used:**
- Sentinel: `llm-semantic-router/halugate-sentinel`
- Detector/Explainer: `cross-encoder/nli-deberta-v3-base`


In [26]:
# HaluGate-Style Pipeline: Setup and Model Loading
# 
# Models used:
# - Sentinel: llm-semantic-router/halugate-sentinel
# - Detector/Explainer: cross-encoder/nli-deberta-v3-base

print("="*65)
print("HALUGATE-STYLE: 3-Stage Hallucination Detection Pipeline")
print("="*65)

halugate_available = False
sentinel_model = None
sentinel_tokenizer = None
halugate_nli_pipeline = None

try:
    from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
    import torch
    
    # Determine device
    if torch.backends.mps.is_available():
        device = torch.device("mps")
        device_name = "MPS (Apple Silicon)"
    elif torch.cuda.is_available():
        device = torch.device("cuda")
        device_name = "CUDA"
    else:
        device = torch.device("cpu")
        device_name = "CPU"
    
    print(f"\nDevice: {device_name}")
    print("\nLoading HaluGate models...")
    
    # Stage 1: Sentinel (prompt classifier)
    print("\n[1/3] Loading Sentinel (prompt classifier)...")
    print("      Model: llm-semantic-router/halugate-sentinel")
    sentinel_tokenizer = AutoTokenizer.from_pretrained("llm-semantic-router/halugate-sentinel")
    sentinel_model = AutoModelForSequenceClassification.from_pretrained("llm-semantic-router/halugate-sentinel")
    sentinel_model.to(device)
    sentinel_model.eval()
    sentinel_id2label = sentinel_model.config.id2label
    print("  ✓ Sentinel loaded")
    
    # Stage 2 & 3: Use NLI model for both detection and explanation
    print("[2/3] Loading NLI model for detection...")
    print("      Model: cross-encoder/nli-deberta-v3-base")
    halugate_nli_pipeline = pipeline(
        "text-classification",
        model="cross-encoder/nli-deberta-v3-base",
        device=device
    )
    print("  ✓ NLI model loaded (for detection + explanation)")
    
    print("[3/3] Explainer uses same NLI model")
    print("  ✓ Shared NLI pipeline")
    
    halugate_available = True
    logger.info("HaluGate-style pipeline ready")
    print("\n" + "─"*65)
    print("✓ Pipeline ready")
    print("  • Sentinel: llm-semantic-router/halugate-sentinel")
    print("  • Detector/Explainer: cross-encoder/nli-deberta-v3-base")
    print("─"*65)
    
except ImportError as e:
    logger.warning(f"Transformers not installed: {e}")
    print("\n✗ Transformers not available")
    print("\nTo install:")
    print("  pip install transformers torch")
    
except Exception as e:
    logger.error(f"HaluGate setup failed: {e}")
    print(f"\n✗ Error loading HaluGate models: {e}")


HALUGATE-STYLE: 3-Stage Hallucination Detection Pipeline

Device: MPS (Apple Silicon)

Loading HaluGate models...

[1/3] Loading Sentinel (prompt classifier)...
      Model: llm-semantic-router/halugate-sentinel
  ✓ Sentinel loaded
[2/3] Loading NLI model for detection...
      Model: cross-encoder/nli-deberta-v3-base


Device set to use mps
[92m[INFO][0m HaluGate-style pipeline ready


  ✓ NLI model loaded (for detection + explanation)
[3/3] Explainer uses same NLI model
  ✓ Shared NLI pipeline

─────────────────────────────────────────────────────────────────
✓ Pipeline ready
  • Sentinel: llm-semantic-router/halugate-sentinel
  • Detector/Explainer: cross-encoder/nli-deberta-v3-base
─────────────────────────────────────────────────────────────────


In [27]:
# HaluGate-Style: Pipeline Implementation

print("="*65)
print("HALUGATE-STYLE: Pipeline Functions")
print("="*65)

if not halugate_available:
    print("\n⚠ HaluGate models not loaded - showing conceptual implementation")
    print("""
HALUGATE PIPELINE ARCHITECTURE:

    ┌─────────────────────────────────────────────────────────────┐
    │                     USER PROMPT                             │
    └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
    ┌─────────────────────────────────────────────────────────────┐
    │  STAGE 1: SENTINEL                                          │
    │  "Does this prompt need fact-checking?"                     │
    │  → FACT_CHECK_NEEDED / NO_FACT_CHECK_NEEDED                 │
    └─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
    NO_FACT_CHECK_NEEDED              FACT_CHECK_NEEDED
              │                               │
              ▼                               ▼
    ┌─────────────────┐         ┌─────────────────────────────────┐
    │ Skip detection  │         │  STAGE 2: DETECTOR (NLI)        │
    │ (72% of traffic)│         │  Claim-level classification     │
    └─────────────────┘         │  → contradiction/neutral/entail │
                                └─────────────────────────────────┘
                                              │
                                              ▼
                                ┌─────────────────────────────────┐
                                │  STAGE 3: EXPLAINER             │
                                │  Detailed NLI explanation       │
                                │  → contradiction/neutral/entail │
                                └─────────────────────────────────┘
    """)
else:
    import torch
    import torch.nn.functional as F
    
    def halugate_sentinel(prompt: str) -> Dict[str, Any]:
        """
        Stage 1: Determine if prompt needs fact-checking.
        Uses official halugate-sentinel model.
        Returns classification and confidence.
        """
        inputs = sentinel_tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = sentinel_model(**inputs)
            probs = F.softmax(outputs.logits, dim=-1)
            pred_class = torch.argmax(probs, dim=-1).item()
            confidence = probs[0][pred_class].item()
        
        label = sentinel_id2label.get(pred_class, str(pred_class))
        needs_check = "FACT_CHECK" in label.upper() or "CHECK" in label.upper()
        
        return {
            "label": label,
            "needs_check": needs_check,
            "confidence": confidence
        }
    
    def halugate_detector(context: str, answer: str) -> Dict[str, Any]:
        """
        Stage 2: Claim-level hallucination detection using NLI.
        Checks if answer claims are supported by context.
        (Uses NLI model as halugate-detector not yet public)
        """
        # Split answer into claims
        claims = [s.strip() for s in re.split(r'[.!?]+', answer) if len(s.strip()) > 10]
        
        results = []
        contradictions = []
        unsupported = []
        
        for claim in claims:
            # NLI: context </s></s> claim
            nli_input = f"{context}</s></s>{claim}"
            prediction = halugate_nli_pipeline(nli_input)[0]
            
            result = {
                "claim": claim,
                "label": prediction["label"],
                "score": prediction["score"]
            }
            results.append(result)
            
            if prediction["label"] == "contradiction" and prediction["score"] > 0.7:
                contradictions.append(claim)
            elif prediction["label"] == "neutral" and prediction["score"] > 0.7:
                unsupported.append(claim)
        
        return {
            "claims": results,
            "contradictions": contradictions,
            "unsupported": unsupported,
            "has_hallucination": len(contradictions) > 0
        }
    
    def halugate_explainer(context: str, claim: str) -> Dict[str, Any]:
        """
        Stage 3: NLI explanation for detected hallucinations.
        Classifies relationship between context and claim.
        """
        nli_input = f"{context}</s></s>{claim}"
        prediction = halugate_nli_pipeline(nli_input)[0]
        
        return {
            "label": prediction["label"],
            "confidence": prediction["score"],
            "is_contradiction": prediction["label"] == "contradiction"
        }
    
    def halugate_full_pipeline(context: str, question: str, answer: str) -> Dict[str, Any]:
        """
        Full HaluGate pipeline: Sentinel → Detector → Explainer
        """
        result = {
            "context": context,
            "question": question,
            "answer": answer,
            "stages": {}
        }
        
        # Stage 1: Sentinel (official model)
        sentinel_result = halugate_sentinel(question)
        result["stages"]["sentinel"] = sentinel_result
        
        if not sentinel_result["needs_check"]:
            result["final_verdict"] = "SKIPPED"
            result["explanation"] = "Prompt does not require fact-checking"
            return result
        
        # Stage 2: Detector (NLI-based)
        detector_result = halugate_detector(context, answer)
        result["stages"]["detector"] = {
            "contradictions": detector_result["contradictions"],
            "unsupported": detector_result["unsupported"],
            "has_hallucination": detector_result["has_hallucination"]
        }
        
        if not detector_result["has_hallucination"] and not detector_result["unsupported"]:
            result["final_verdict"] = "PASSED"
            result["explanation"] = "All claims supported by context"
            return result
        
        # Stage 3: Explainer (provides verdict)
        if detector_result["contradictions"]:
            # Explain the first contradiction
            explainer_result = halugate_explainer(context, detector_result["contradictions"][0])
            result["stages"]["explainer"] = explainer_result
            result["final_verdict"] = "HALLUCINATION_DETECTED"
            result["explanation"] = f"Response contradicts context: '{detector_result['contradictions'][0][:50]}...'"
        elif detector_result["unsupported"]:
            explainer_result = halugate_explainer(context, detector_result["unsupported"][0])
            result["stages"]["explainer"] = explainer_result
            result["final_verdict"] = "UNSUPPORTED"
            result["explanation"] = f"Claim not in context: '{detector_result['unsupported'][0][:50]}...'"
        else:
            result["final_verdict"] = "PASSED"
            result["explanation"] = "Response is supported by context"
        
        return result
    
    print("✓ Pipeline functions defined:")
    print("  • halugate_sentinel(prompt) → needs_check, confidence")
    print("  • halugate_detector(context, answer) → contradictions, unsupported")
    print("  • halugate_explainer(context, claim) → entailment/neutral/contradiction")
    print("  • halugate_full_pipeline(context, question, answer) → full analysis")


HALUGATE-STYLE: Pipeline Functions
✓ Pipeline functions defined:
  • halugate_sentinel(prompt) → needs_check, confidence
  • halugate_detector(context, answer) → contradictions, unsupported
  • halugate_explainer(context, claim) → entailment/neutral/contradiction
  • halugate_full_pipeline(context, question, answer) → full analysis


In [28]:
# HaluGate-Style: Demo with Test Cases

print("="*65)
print("HALUGATE-STYLE: Full Pipeline Demo")
print("="*65)

# Test cases covering different scenarios
halugate_test_cases = [
    {
        "name": "Creative Prompt (Should Skip)",
        "context": "",
        "question": "Write a poem about autumn leaves",
        "answer": "Golden leaves fall gently down, painting nature's carpet on the ground."
    },
    {
        "name": "Factual - Correct Answer",
        "context": "The Eiffel Tower is located in Paris, France. It was completed in 1889 and is 330 meters tall.",
        "question": "Where is the Eiffel Tower and how tall is it?",
        "answer": "The Eiffel Tower is located in Paris, France. It stands 330 meters tall."
    },
    {
        "name": "Factual - Intrinsic Hallucination",
        "context": "The Eiffel Tower is located in Paris, France. It was completed in 1889.",
        "question": "Where is the Eiffel Tower located?",
        "answer": "The Eiffel Tower is located in Berlin, Germany."
    },
    {
        "name": "Factual - Extrinsic Hallucination",
        "context": "Apple Inc. was founded in 1976 by Steve Jobs and Steve Wozniak.",
        "question": "Who founded Apple?",
        "answer": "Apple was founded by Steve Jobs and Steve Wozniak in 1976. They started the company in Bill Gates' garage."
    },
    {
        "name": "RAG Scenario - Tool Response",
        "context": "Order #12345 was placed on 2024-01-15. Status: Shipped. Expected delivery: 2024-01-20.",
        "question": "What's the status of order #12345?",
        "answer": "Order #12345 was shipped on January 15th and will arrive by January 20th."
    }
]

if not halugate_available:
    print("\n⚠ HaluGate models not loaded - showing expected outputs")
    print("\nExpected behavior for test cases:")
    for case in halugate_test_cases:
        print(f"\n[{case['name']}]")
        print(f"  Question: {case['question'][:50]}...")
        if "Creative" in case['name']:
            print("  → Sentinel: NO_FACT_CHECK_NEEDED → Skip detection")
        elif "Correct" in case['name'] or "RAG" in case['name']:
            print("  → Sentinel: FACT_CHECK_NEEDED → Detector: No hallucination → PASSED")
        else:
            print("  → Sentinel: FACT_CHECK_NEEDED → Detector: Hallucination → Explainer: contradiction")
else:
    for case in halugate_test_cases:
        print(f"\n{'─'*65}")
        print(f"[{case['name']}]")
        print(f"{'─'*65}")
        
        if case['context']:
            ctx_display = case['context'][:60] + "..." if len(case['context']) > 60 else case['context']
            print(f"Context: {ctx_display}")
        print(f"Question: {case['question']}")
        print(f"Answer: {case['answer']}")
        
        # Run full pipeline
        result = halugate_full_pipeline(
            context=case['context'],
            question=case['question'],
            answer=case['answer']
        )
        
        print(f"\nPipeline Results:")
        
        # Stage 1
        sentinel = result['stages']['sentinel']
        print(f"  [Sentinel] {sentinel['label']} (confidence: {sentinel['confidence']:.2f})")
        
        # Stage 2 (if ran)
        if 'detector' in result['stages']:
            detector = result['stages']['detector']
            if detector['contradictions']:
                print(f"  [Detector] ⚠ Contradictions: {len(detector['contradictions'])} found")
            elif detector['unsupported']:
                print(f"  [Detector] ○ Unsupported claims: {len(detector['unsupported'])} found")
            else:
                print(f"  [Detector] ✓ All claims supported")
        
        # Stage 3 (if ran)
        if 'explainer' in result['stages']:
            explainer = result['stages']['explainer']
            symbol = "⚠" if explainer['is_contradiction'] else "○" if explainer['label'] == 'neutral' else "✓"
            print(f"  [Explainer] {symbol} {explainer['label']} (confidence: {explainer['confidence']:.2f})")
        
        # Final verdict
        verdict_symbol = {
            "PASSED": "✓",
            "SKIPPED": "○",
            "HALLUCINATION_DETECTED": "⚠",
            "UNSUPPORTED": "⚠"
        }.get(result['final_verdict'], "?")
        
        print(f"\n  {verdict_symbol} VERDICT: {result['final_verdict']}")
        print(f"    {result['explanation']}")


HALUGATE-STYLE: Full Pipeline Demo

─────────────────────────────────────────────────────────────────
[Creative Prompt (Should Skip)]
─────────────────────────────────────────────────────────────────
Question: Write a poem about autumn leaves
Answer: Golden leaves fall gently down, painting nature's carpet on the ground.

Pipeline Results:
  [Sentinel] NO_FACT_CHECK_NEEDED (confidence: 1.00)
  [Detector] ○ Unsupported claims: 1 found
  [Explainer] ○ neutral (confidence: 0.99)

  ⚠ VERDICT: UNSUPPORTED
    Claim not in context: 'Golden leaves fall gently down, painting nature's ...'

─────────────────────────────────────────────────────────────────
[Factual - Correct Answer]
─────────────────────────────────────────────────────────────────
Context: The Eiffel Tower is located in Paris, France. It was complet...
Question: Where is the Eiffel Tower and how tall is it?
Answer: The Eiffel Tower is located in Paris, France. It stands 330 meters tall.

Pipeline Results:
  [Sentinel] FACT_CHEC

In [29]:
# HaluGate-Style: Live Demo with Ollama-Generated Responses

print("="*65)
print("HALUGATE-STYLE: Live Demo with Ollama")
print("="*65)

if not ollama_ready:
    print("\n⚠ Ollama not running - skipping live demo")
elif not halugate_available:
    print("\n⚠ Pipeline models not available - skipping live demo")
else:
    # RAG-style scenarios
    live_scenarios = [
        {
            "name": "Product Documentation",
            "context": """TechCorp Pro Plan: $50/month
            - 1TB storage
            - 25 user seats
            - 24/7 email support
            - API access included
            Phone support is NOT included in Pro plan.""",
            "question": "What does the Pro plan include?"
        },
        {
            "name": "Historical Query",
            "context": """The first iPhone was announced by Steve Jobs on January 9, 2007.
            It went on sale on June 29, 2007 in the United States.
            The original iPhone had a 3.5-inch screen and 2 megapixel camera.""",
            "question": "When was the first iPhone released and what were its specs?"
        }
    ]
    
    for scenario in live_scenarios:
        print(f"\n{'═'*65}")
        print(f"[{scenario['name']}]")
        print(f"{'═'*65}")
        print(f"Context: {scenario['context'][:80]}...")
        print(f"Question: {scenario['question']}")
        
        # Generate response with Ollama
        prompt = f"""Based ONLY on the following context, answer the question.
Do not add any information not present in the context.

Context: {scenario['context']}

Question: {scenario['question']}

Answer:"""
        
        print("\n[Generating response with Ollama...]")
        response = ollama_generate(prompt, temperature=0.5)
        response = clean_response(response)
        
        resp_display = response[:150] + "..." if len(response) > 150 else response
        print(f"Response: {resp_display}")
        
        # Run pipeline
        print(f"\n{'─'*65}")
        print("Pipeline Analysis:")
        print(f"{'─'*65}")
        
        result = halugate_full_pipeline(
            context=scenario['context'],
            question=scenario['question'],
            answer=response
        )
        
        # Stage results
        sentinel = result['stages']['sentinel']
        print(f"  [1] Sentinel: {sentinel['label']} ({sentinel['confidence']:.2f})")
        
        if 'detector' in result['stages']:
            detector = result['stages']['detector']
            if detector['contradictions']:
                print(f"  [2] Detector: ⚠ {len(detector['contradictions'])} contradiction(s) found")
                for c in detector['contradictions'][:2]:
                    print(f"      → \"{c[:50]}...\"")
            elif detector['unsupported']:
                print(f"  [2] Detector: ○ {len(detector['unsupported'])} unsupported claim(s)")
            else:
                print(f"  [2] Detector: ✓ All claims grounded")
        
        if 'explainer' in result['stages']:
            explainer = result['stages']['explainer']
            print(f"  [3] Explainer: {explainer['label']} ({explainer['confidence']:.2f})")
        
        # Verdict
        print(f"\n  ══► VERDICT: {result['final_verdict']}")
        print(f"      {result['explanation']}")


HALUGATE-STYLE: Live Demo with Ollama

═════════════════════════════════════════════════════════════════
[Product Documentation]
═════════════════════════════════════════════════════════════════
Context: TechCorp Pro Plan: $50/month
            - 1TB storage
            - 25 user sea...
Question: What does the Pro plan include?

[Generating response with Ollama...]
Response: 1TB storage  
25 user seats  
24/7 email support  
API access

─────────────────────────────────────────────────────────────────
Pipeline Analysis:
─────────────────────────────────────────────────────────────────
  [1] Sentinel: FACT_CHECK_NEEDED (1.00)
  [2] Detector: ○ 1 unsupported claim(s)
  [3] Explainer: neutral (0.98)

  ══► VERDICT: UNSUPPORTED
      Claim not in context: '1TB storage  
25 user seats  
24/7 email support  ...'

═════════════════════════════════════════════════════════════════
[Historical Query]
═════════════════════════════════════════════════════════════════
Context: The first iPhone was 

---

# Demo 5: Grounded Prompting (Mitigation via Prompt Engineering)

Reduce hallucinations through **prompt structure**, not additional models:

| Technique | Effect |
|-----------|--------|
| Explicit grounding instruction | "Answer ONLY based on context" |
| Context before question | Recency bias works in your favor |
| "I don't know" permission | Model won't fabricate to please |
| Citation requirement | Forces grounding in source |

This is the cheapest mitigation — no extra models, just better prompts.


In [31]:
# Grounded Prompting: Implementation

print("="*65)
print("GROUNDED PROMPTING: Mitigation via Prompt Engineering")
print("="*65)

def build_grounded_prompt(
    query: str,
    retrieved_context: str,
    instructions: str = ""
) -> str:
    """
    Build a prompt that encourages grounded responses.
    
    Key techniques:
    1. Explicit grounding instruction
    2. Context before question (recency bias)
    3. "I don't know" permission
    4. Citation requirement
    """
    return f"""You are a helpful assistant that answers questions based ONLY on the provided context.

RULES:
- Answer ONLY based on information in the CONTEXT below
- If the context doesn't contain the answer, say "I don't have information about that in the provided documents"
- Quote or paraphrase directly from the context
- Never make up information

CONTEXT:
{retrieved_context}

QUESTION: {query}

{instructions}

Provide your answer, citing the relevant parts of the context:"""


def build_ungrounded_prompt(query: str, context: str) -> str:
    """Simple prompt without grounding techniques."""
    return f"""Context: {context}

Question: {query}

Answer:"""


print("✓ Prompt builders defined:")
print("  • build_grounded_prompt(query, context) → Grounded prompt with rules")
print("  • build_ungrounded_prompt(query, context) → Simple prompt (baseline)")


GROUNDED PROMPTING: Mitigation via Prompt Engineering
✓ Prompt builders defined:
  • build_grounded_prompt(query, context) → Grounded prompt with rules
  • build_ungrounded_prompt(query, context) → Simple prompt (baseline)


In [32]:
# Grounded Prompting: Comparison Demo

print("="*65)
print("GROUNDED vs UNGROUNDED: Side-by-Side Comparison")
print("="*65)

if not ollama_ready:
    print("\n⚠ Ollama not running - skipping live demo")
else:
    # Test cases designed to tempt hallucination
    grounding_tests = [
        {
            "name": "Question Beyond Context",
            "context": "TechCorp was founded in 2015 in Austin, Texas. The company has 500 employees.",
            "question": "Who is the CEO of TechCorp and what is the company's revenue?"
        },
        {
            "name": "Tempting Fabrication",
            "context": "The Apollo 11 mission landed on the Moon on July 20, 1969. Neil Armstrong and Buzz Aldrin walked on the lunar surface.",
            "question": "What did Neil Armstrong say when he first stepped on the Moon, and how long did the moonwalk last?"
        },
        {
            "name": "Partial Information",
            "context": "Python 3.12 was released in October 2023. It includes performance improvements and better error messages.",
            "question": "What are all the new features in Python 3.12 and who developed them?"
        }
    ]
    
    for test in grounding_tests:
        print(f"\n{'═'*65}")
        print(f"[{test['name']}]")
        print(f"{'═'*65}")
        print(f"Context: {test['context'][:80]}...")
        print(f"Question: {test['question']}")
        
        # Generate with ungrounded prompt
        print(f"\n{'─'*65}")
        print("UNGROUNDED PROMPT:")
        ungrounded_prompt = build_ungrounded_prompt(test['question'], test['context'])
        ungrounded_response = clean_response(ollama_generate(ungrounded_prompt, temperature=0.7))
        ungrounded_display = ungrounded_response[:200] + "..." if len(ungrounded_response) > 200 else ungrounded_response
        print(f"  Response: {ungrounded_display}")
        
        # Generate with grounded prompt
        print(f"\n{'─'*65}")
        print("GROUNDED PROMPT:")
        grounded_prompt = build_grounded_prompt(test['question'], test['context'])
        grounded_response = clean_response(ollama_generate(grounded_prompt, temperature=0.7))
        grounded_display = grounded_response[:200] + "..." if len(grounded_response) > 200 else grounded_response
        print(f"  Response: {grounded_display}")
        
        # Analyze with NLI if available
        if nli_available:
            print(f"\n{'─'*65}")
            print("NLI ANALYSIS:")
            
            ungrounded_result = nli_detect_hallucination(test['context'], ungrounded_response)
            grounded_result = nli_detect_hallucination(test['context'], grounded_response)
            
            ungrounded_issues = len([c for c in ungrounded_result['all_claims'] if c['label'] != 'entailment'])
            grounded_issues = len([c for c in grounded_result['all_claims'] if c['label'] != 'entailment'])
            
            print(f"  Ungrounded: {len(ungrounded_result['all_claims'])} claims, {ungrounded_issues} potentially ungrounded")
            print(f"  Grounded:   {len(grounded_result['all_claims'])} claims, {grounded_issues} potentially ungrounded")
            
            if grounded_issues < ungrounded_issues:
                print("  ✓ Grounded prompt produced more faithful response")
            elif grounded_issues == ungrounded_issues:
                print("  ○ Similar faithfulness (model may have behaved well in both)")
            else:
                print("  ⚠ Unexpected: grounded prompt had more issues")


GROUNDED vs UNGROUNDED: Side-by-Side Comparison

═════════════════════════════════════════════════════════════════
[Question Beyond Context]
═════════════════════════════════════════════════════════════════
Context: TechCorp was founded in 2015 in Austin, Texas. The company has 500 employees....
Question: Who is the CEO of TechCorp and what is the company's revenue?

─────────────────────────────────────────────────────────────────
UNGROUNDED PROMPT:
  Response: Based **solely on the provided context**, **the answer cannot be determined**. Here's why:

1.  **The context does not mention the CEO**: The context only states that "TechCorp was founded in 2015 in ...

─────────────────────────────────────────────────────────────────
GROUNDED PROMPT:
  Response: I don't have information about that in the provided documents

The provided context only states: "TechCorp was founded in 2015 in Austin, Texas. The company has 500 employees." There is no mention of ...

────────────────────────────

In [33]:
# Final Summary: All Hallucination Detection Methods

print("="*70)
print("HALLUCINATION DETECTION: Complete Summary")
print("="*70)
print("""
┌───────────────────┬─────────────┬──────────────┬──────────────────┐
│ Method            │ Granularity │ Stages       │ Use Case         │
├───────────────────┼─────────────┼──────────────┼──────────────────┤
│ LettuceDetect     │ Token       │ 1 (detect)   │ RAG pipelines    │
│ NLI-Based         │ Sentence    │ 1 (classify) │ Contradiction    │
│ Self-Consistency  │ Response    │ N samples    │ Uncertainty      │
│ HaluGate-Style    │ Claim+NLI   │ 3 (S→D→E)    │ Conditional RAG  │
│ Grounded Prompt   │ Prompt      │ 0 (prevent)  │ All pipelines    │
│ LLM-as-Judge      │ Semantic    │ 1 (judge)    │ High-stakes      │
└───────────────────┴─────────────┴──────────────┴──────────────────┘

PIPELINE ARCHITECTURE (Demo 4):

  ┌─────────────────────────────────────────────────────────────────────┐
  │   [User Query] ──► [Sentinel] ──► Skip? ──► Direct LLM             │
  │                         │                                           │
  │                         ▼ (needs fact-check)                        │
  │                                                                     │
  │   [LLM Response] ──► [Detector/NLI] ──► No issue? ──► ✓            │
  │                         │                                           │
  │                         ▼ (contradiction/neutral)                   │
  │                                                                     │
  │   [Explainer] ──► Flag/Block/Regenerate                            │
  └─────────────────────────────────────────────────────────────────────┘

MODELS USED:

  Demo 1: KRLabsOrg/lettuce-detect-base-modernbert-en-v1
  Demo 2: cross-encoder/nli-deberta-v3-base
  Demo 3: (uses LLM directly)
  Demo 4: llm-semantic-router/halugate-sentinel
          cross-encoder/nli-deberta-v3-base
  Demo 5: (prompt engineering, no model)
""")

print("\n" + "="*70)
print("ALL DEMOS STATUS")
print("="*70)
all_status = [
    ("Ollama (qwen3:4b)", "✓" if ollama_ready else "✗"),
    ("LettuceDetect", "✓" if lettucedetect_available else "✗"),
    ("NLI Model", "✓" if nli_available else "✗"),
    ("HaluGate (3 models)", "✓" if halugate_available else "✗"),
]
for name, stat in all_status:
    print(f"  {stat} {name}")


HALLUCINATION DETECTION: Complete Summary

┌───────────────────┬─────────────┬──────────────┬──────────────────┐
│ Method            │ Granularity │ Stages       │ Use Case         │
├───────────────────┼─────────────┼──────────────┼──────────────────┤
│ LettuceDetect     │ Token       │ 1 (detect)   │ RAG pipelines    │
│ NLI-Based         │ Sentence    │ 1 (classify) │ Contradiction    │
│ Self-Consistency  │ Response    │ N samples    │ Uncertainty      │
│ HaluGate-Style    │ Claim+NLI   │ 3 (S→D→E)    │ Conditional RAG  │
│ Grounded Prompt   │ Prompt      │ 0 (prevent)  │ All pipelines    │
│ LLM-as-Judge      │ Semantic    │ 1 (judge)    │ High-stakes      │
└───────────────────┴─────────────┴──────────────┴──────────────────┘

PIPELINE ARCHITECTURE (Demo 4):

  ┌─────────────────────────────────────────────────────────────────────┐
  │   [User Query] ──► [Sentinel] ──► Skip? ──► Direct LLM             │
  │                         │                                           │
  