# Multi-Layered Framework for LLM Hallucination Mitigation

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yourusername/multi-layered-llm-hallucination-mitigation/blob/main/LLM_Hallucination_Mitigation_Demo.ipynb)

This notebook provides an interactive demonstration of our multi-layered hallucination mitigation framework. You can run this directly in Google Colab or locally with Jupyter.

**Paper:** Hiriyanna, S., & Zhao, W. (2025). Multi-Layered Framework for LLM Hallucination Mitigation in High-Stakes Applications: A Tutorial. *Computers*, MDPI.

## 1. Setup and Installation

First, let's install the required dependencies and set up your API keys.

In [None]:
# Install required packages
!pip install -q openai chromadb numpy tiktoken sentence-transformers

In [None]:
import os
import json
import numpy as np
from typing import List, Dict, Any, Optional, Tuple
from dataclasses import dataclass
from enum import Enum
import warnings
warnings.filterwarnings('ignore')

# For API key input
from getpass import getpass

## 2. API Key Configuration

Enter your OpenAI API key below. It will be hidden for security.

In [None]:
# Securely input your OpenAI API key
OPENAI_API_KEY = getpass('Enter your OpenAI API key: ')
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

# Initialize OpenAI client
import openai
openai.api_key = OPENAI_API_KEY

print("‚úì API key configured successfully")

## 3. Framework Implementation

### 3.1 Domain Classification

In [None]:
class Domain(Enum):
    GENERAL = "general"
    FINANCIAL = "financial"
    COMPLIANCE = "compliance"

@dataclass
class DomainConfig:
    domain: Domain
    confidence_threshold: float
    keywords: List[str]

# Domain configurations with confidence thresholds
DOMAIN_CONFIGS = {
    Domain.FINANCIAL: DomainConfig(
        domain=Domain.FINANCIAL,
        confidence_threshold=0.80,
        keywords=["fee", "investment", "fund", "rate", "return", "portfolio", "401k", "IRA"]
    ),
    Domain.COMPLIANCE: DomainConfig(
        domain=Domain.COMPLIANCE,
        confidence_threshold=0.85,
        keywords=["regulation", "compliance", "legal", "policy", "rule", "requirement"]
    ),
    Domain.GENERAL: DomainConfig(
        domain=Domain.GENERAL,
        confidence_threshold=0.75,
        keywords=[]
    )
}

def classify_domain(query: str) -> Domain:
    """Classify query into domain based on keywords."""
    query_lower = query.lower()
    
    for domain, config in DOMAIN_CONFIGS.items():
        if domain != Domain.GENERAL:
            if any(keyword in query_lower for keyword in config.keywords):
                return domain
    
    return Domain.GENERAL

print("Domain classification module loaded")

### 3.2 Prompt Engineering Layer

In [None]:
class PromptEngineer:
    """Implements prompt engineering techniques."""
    
    @staticmethod
    def create_few_shot_prompt(query: str, examples: List[Dict[str, str]]) -> str:
        """Create few-shot prompt with examples."""
        prompt = "You are a helpful assistant. Here are some examples:\n\n"
        
        for example in examples:
            prompt += f"Q: {example['question']}\n"
            prompt += f"A: {example['answer']}\n\n"
        
        prompt += f"Q: {query}\nA:"
        return prompt
    
    @staticmethod
    def create_role_prompt(query: str, role: str) -> str:
        """Create role-playing prompt."""
        return f"""You are {role}. 
Your responses must be accurate, grounded in facts, and acknowledge uncertainty when appropriate.
If you're not certain about something, say so clearly.

Question: {query}
Answer:"""
    
    @staticmethod
    def create_cot_prompt(query: str) -> str:
        """Create chain-of-thought prompt."""
        return f"""Let's think about this step-by-step.

Question: {query}

Please reason through this systematically:
1. First, identify what information is being requested
2. Consider what facts are relevant
3. Think about any limitations or uncertainties
4. Provide your answer based on this reasoning

Reasoning and Answer:"""

prompt_engineer = PromptEngineer()
print("Prompt engineering layer loaded")

### 3.3 RAG Layer Implementation

In [None]:
# Sample knowledge base for demonstration
SAMPLE_KNOWLEDGE_BASE = [
    {
        "id": "doc_1",
        "content": "The Quantum Investment Fund has an annual management fee of 0.75% and a front-load fee of 2%. The minimum investment is $10,000.",
        "metadata": {"source": "Fund Prospectus", "type": "product_info"}
    },
    {
        "id": "doc_2",
        "content": "Professional Trader Accounts require: minimum annual income of $150,000 OR net worth of $1,000,000, high risk tolerance, and at least 5 years of trading experience.",
        "metadata": {"source": "Account Requirements", "type": "compliance"}
    },
    {
        "id": "doc_3",
        "content": "Our Global Stability Fund focuses on low-risk investments with an annual management fee of 0.50% and no load fees. Suitable for conservative investors.",
        "metadata": {"source": "Fund Information", "type": "product_info"}
    },
    {
        "id": "doc_4",
        "content": "All investment recommendations must include risk disclosures and past performance warnings as per SEC regulations.",
        "metadata": {"source": "Compliance Manual", "type": "compliance"}
    }
]

class SimpleRAG:
    """Simplified RAG implementation for demonstration."""
    
    def __init__(self, documents: List[Dict[str, Any]]):
        self.documents = documents
        self.client = openai.OpenAI()
    
    def get_embedding(self, text: str) -> List[float]:
        """Get embedding for text using OpenAI."""
        response = self.client.embeddings.create(
            model="text-embedding-ada-002",
            input=text
        )
        return response.data[0].embedding
    
    def cosine_similarity(self, a: List[float], b: List[float]) -> float:
        """Calculate cosine similarity between two vectors."""
        a_np = np.array(a)
        b_np = np.array(b)
        return np.dot(a_np, b_np) / (np.linalg.norm(a_np) * np.linalg.norm(b_np))
    
    def retrieve(self, query: str, top_k: int = 3, threshold: float = 0.75) -> Tuple[List[Dict], float]:
        """Retrieve relevant documents and calculate confidence."""
        query_embedding = self.get_embedding(query)
        
        # Calculate similarities
        similarities = []
        for doc in self.documents:
            doc_embedding = self.get_embedding(doc['content'])
            similarity = self.cosine_similarity(query_embedding, doc_embedding)
            similarities.append((similarity, doc))
        
        # Sort by similarity
        similarities.sort(key=lambda x: x[0], reverse=True)
        
        # Filter by threshold and get top-k
        relevant_docs = []
        scores = []
        for similarity, doc in similarities[:top_k]:
            if similarity >= threshold:
                relevant_docs.append(doc)
                scores.append(similarity)
        
        # Calculate confidence (weighted average of similarities)
        if scores:
            weights = [1.0, 0.5, 0.33][:len(scores)]
            confidence = sum(s * w for s, w in zip(scores, weights)) / sum(weights)
        else:
            confidence = 0.0
        
        return relevant_docs, confidence

# Initialize RAG with sample knowledge base
rag_system = SimpleRAG(SAMPLE_KNOWLEDGE_BASE)
print("RAG layer initialized with sample knowledge base")

### 3.4 Multi-Layered Framework Agent

In [None]:
class HallucinationMitigationFramework:
    """Main framework combining all layers."""
    
    def __init__(self, rag_system: SimpleRAG):
        self.rag = rag_system
        self.client = openai.OpenAI()
        self.prompt_engineer = PromptEngineer()
    
    def process_query(self, query: str, verbose: bool = True) -> Dict[str, Any]:
        """Process query through the multi-layered framework."""
        
        # Step 1: Domain Classification
        domain = classify_domain(query)
        threshold = DOMAIN_CONFIGS[domain].confidence_threshold
        
        if verbose:
            print(f"üìä Domain: {domain.value} (threshold: {threshold})")
        
        # Step 2: RAG Retrieval
        relevant_docs, confidence = self.rag.retrieve(query, threshold=0.7)
        
        if verbose:
            print(f"üîç Retrieved {len(relevant_docs)} relevant documents")
            print(f"üìà Confidence: {confidence:.3f}")
        
        # Step 3: Decision based on confidence
        if confidence < threshold:
            if verbose:
                print(f"‚ö†Ô∏è Confidence below threshold - escalating to human")
            return {
                "status": "escalated",
                "response": "I don't have sufficient information to answer this question accurately. Please contact a human representative for assistance.",
                "confidence": confidence,
                "domain": domain.value
            }
        
        # Step 4: Generate response with context
        context = "\n\n".join([doc['content'] for doc in relevant_docs])
        
        # Apply prompt engineering
        if domain == Domain.FINANCIAL:
            role = "a financial advisor assistant"
            prompt = self.prompt_engineer.create_role_prompt(query, role)
        elif domain == Domain.COMPLIANCE:
            prompt = self.prompt_engineer.create_cot_prompt(query)
        else:
            prompt = query
        
        # Add context to prompt
        full_prompt = f"""Based on the following verified information:

{context}

{prompt}

Important: Only use information from the provided context. If the answer is not in the context, say so."""
        
        # Generate response
        response = self.client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=[{"role": "user", "content": full_prompt}],
            temperature=0.0,
            max_tokens=500
        )
        
        if verbose:
            print(f"‚úÖ Response generated successfully")
        
        return {
            "status": "success",
            "response": response.choices[0].message.content,
            "confidence": confidence,
            "domain": domain.value,
            "sources": [doc['metadata']['source'] for doc in relevant_docs]
        }

# Initialize the framework
framework = HallucinationMitigationFramework(rag_system)
print("‚úÖ Multi-layered framework initialized and ready!")

## 4. Interactive Demo

Now let's test the framework with some example queries. You can modify these or add your own!

In [None]:
# Test queries demonstrating different scenarios
TEST_QUERIES = [
    "What are the fees for the Quantum Investment Fund?",
    "What is the minimum investment for the Global Stability Fund?",
    "Am I eligible for a Professional Trader Account with $100,000 income?",
    "Tell me about the XYZ fund",  # Non-existent fund - should escalate
    "What is the weather today?"  # Off-topic - should handle appropriately
]

print("Select a query to test or enter your own:\n")
for i, query in enumerate(TEST_QUERIES, 1):
    print(f"{i}. {query}")
print("\n0. Enter custom query")

In [None]:
# Interactive query selection
choice = input("\nEnter your choice (0-5): ")

if choice == "0":
    query = input("Enter your custom query: ")
else:
    query = TEST_QUERIES[int(choice) - 1]

print(f"\n{'='*60}")
print(f"Query: {query}")
print(f"{'='*60}\n")

# Process the query
result = framework.process_query(query, verbose=True)

print(f"\n{'='*60}")
print("FRAMEWORK RESPONSE:")
print(f"{'='*60}")
print(f"\nStatus: {result['status']}")
print(f"Domain: {result['domain']}")
print(f"Confidence: {result['confidence']:.3f}")
if 'sources' in result:
    print(f"Sources: {', '.join(result['sources'])}")
print(f"\nResponse:\n{result['response']}")

## 5. Batch Evaluation

Let's run all test queries to see how the framework performs across different scenarios.

In [None]:
def evaluate_framework(queries: List[str]):
    """Evaluate framework on multiple queries."""
    results = []
    
    for query in queries:
        print(f"\nProcessing: {query[:50]}...")
        result = framework.process_query(query, verbose=False)
        results.append({
            "query": query,
            "status": result['status'],
            "confidence": result['confidence'],
            "domain": result['domain']
        })
    
    return results

# Run evaluation
print("Running batch evaluation...\n")
evaluation_results = evaluate_framework(TEST_QUERIES)

# Display results in a table
print("\n" + "="*80)
print(f"{'Query':<40} {'Status':<12} {'Domain':<12} {'Confidence':<10}")
print("="*80)

for result in evaluation_results:
    query_short = result['query'][:37] + "..." if len(result['query']) > 40 else result['query']
    print(f"{query_short:<40} {result['status']:<12} {result['domain']:<12} {result['confidence']:.3f}")

# Calculate statistics
successful = sum(1 for r in evaluation_results if r['status'] == 'success')
escalated = sum(1 for r in evaluation_results if r['status'] == 'escalated')

print("\n" + "="*80)
print(f"Summary: {successful}/{len(evaluation_results)} successful, {escalated}/{len(evaluation_results)} escalated")
print(f"Average confidence: {np.mean([r['confidence'] for r in evaluation_results]):.3f}")

## 6. Comparison with Baseline GPT-4

Let's compare our framework's response with a baseline GPT-4 response (without RAG or confidence thresholds).

In [None]:
def get_baseline_response(query: str) -> str:
    """Get baseline GPT-4 response without framework."""
    client = openai.OpenAI()
    response = client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[{"role": "user", "content": query}],
        temperature=0.0,
        max_tokens=500
    )
    return response.choices[0].message.content

# Compare on a specific query
test_query = "What are the fees for the Quantum Investment Fund?"

print(f"Query: {test_query}\n")
print("="*60)
print("BASELINE GPT-4 RESPONSE:")
print("="*60)
baseline_response = get_baseline_response(test_query)
print(baseline_response)
print(f"\nLength: {len(baseline_response)} characters")

print("\n" + "="*60)
print("FRAMEWORK RESPONSE:")
print("="*60)
framework_result = framework.process_query(test_query, verbose=False)
print(framework_result['response'])
print(f"\nLength: {len(framework_result['response'])} characters")
print(f"Confidence: {framework_result['confidence']:.3f}")
print(f"Sources: {', '.join(framework_result.get('sources', []))}")

## 7. Custom Knowledge Base

You can add your own documents to the knowledge base. Try adding domain-specific information below.

In [None]:
# Add custom documents to the knowledge base
def add_custom_document():
    print("Add a custom document to the knowledge base\n")
    
    content = input("Enter document content: ")
    source = input("Enter source name: ")
    doc_type = input("Enter document type (product_info/compliance/general): ")
    
    new_doc = {
        "id": f"custom_{len(rag_system.documents) + 1}",
        "content": content,
        "metadata": {"source": source, "type": doc_type}
    }
    
    rag_system.documents.append(new_doc)
    print(f"\n‚úÖ Document added successfully! Knowledge base now has {len(rag_system.documents)} documents.")
    
    # Test with a query
    test_query = input("\nEnter a query to test the new document (or press Enter to skip): ")
    if test_query:
        result = framework.process_query(test_query, verbose=True)
        print(f"\nResponse: {result['response']}")

# Uncomment to add a custom document
# add_custom_document()

## 8. Performance Metrics

Let's calculate some performance metrics similar to those reported in our paper.

In [None]:
def calculate_metrics(queries: List[str]):
    """Calculate performance metrics for the framework."""
    
    metrics = {
        "total_queries": len(queries),
        "successful": 0,
        "escalated": 0,
        "avg_confidence": [],
        "response_lengths": [],
        "domains": {"general": 0, "financial": 0, "compliance": 0}
    }
    
    for query in queries:
        result = framework.process_query(query, verbose=False)
        
        if result['status'] == 'success':
            metrics['successful'] += 1
        else:
            metrics['escalated'] += 1
        
        metrics['avg_confidence'].append(result['confidence'])
        metrics['response_lengths'].append(len(result['response']))
        metrics['domains'][result['domain']] += 1
    
    # Calculate summary statistics
    metrics['success_rate'] = metrics['successful'] / metrics['total_queries'] * 100
    metrics['escalation_rate'] = metrics['escalated'] / metrics['total_queries'] * 100
    metrics['avg_confidence'] = np.mean(metrics['avg_confidence'])
    metrics['avg_response_length'] = np.mean(metrics['response_lengths'])
    
    return metrics

# Calculate metrics
print("Calculating performance metrics...\n")
metrics = calculate_metrics(TEST_QUERIES)

# Display metrics
print("Framework Performance Metrics")
print("="*40)
print(f"Total Queries: {metrics['total_queries']}")
print(f"Success Rate: {metrics['success_rate']:.1f}%")
print(f"Escalation Rate: {metrics['escalation_rate']:.1f}%")
print(f"Average Confidence: {metrics['avg_confidence']:.3f}")
print(f"Average Response Length: {metrics['avg_response_length']:.0f} characters")
print("\nDomain Distribution:")
for domain, count in metrics['domains'].items():
    print(f"  {domain}: {count} queries")

## 9. Visualization

Let's create a simple visualization of the framework's performance.

In [None]:
import matplotlib.pyplot as plt

# Create performance visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot 1: Success vs Escalation
axes[0].pie([metrics['successful'], metrics['escalated']], 
            labels=['Successful', 'Escalated'],
            autopct='%1.1f%%',
            colors=['#2ecc71', '#e74c3c'])
axes[0].set_title('Response Status Distribution')

# Plot 2: Domain Distribution
domains = list(metrics['domains'].keys())
counts = list(metrics['domains'].values())
axes[1].bar(domains, counts, color=['#3498db', '#f39c12', '#9b59b6'])
axes[1].set_title('Query Domain Distribution')
axes[1].set_xlabel('Domain')
axes[1].set_ylabel('Number of Queries')

# Plot 3: Confidence Distribution
confidence_data = [framework.process_query(q, verbose=False)['confidence'] for q in TEST_QUERIES]
axes[2].hist(confidence_data, bins=10, color='#1abc9c', edgecolor='black')
axes[2].axvline(x=0.75, color='r', linestyle='--', label='General Threshold')
axes[2].axvline(x=0.80, color='orange', linestyle='--', label='Financial Threshold')
axes[2].axvline(x=0.85, color='purple', linestyle='--', label='Compliance Threshold')
axes[2].set_title('Confidence Score Distribution')
axes[2].set_xlabel('Confidence Score')
axes[2].set_ylabel('Frequency')
axes[2].legend()

plt.tight_layout()
plt.show()

print("\nVisualization complete! The charts show:")
print("1. Distribution of successful vs escalated responses")
print("2. Distribution of queries across different domains")
print("3. Confidence score distribution with domain-specific thresholds")

## 10. Summary and Next Steps

Congratulations! You've successfully run the Multi-Layered Hallucination Mitigation Framework. Here's what we've demonstrated:

### Key Features:
- **Domain Classification**: Automatically categorizes queries and applies appropriate thresholds
- **RAG Integration**: Grounds responses in verified knowledge base documents
- **Confidence-Based Escalation**: Knows when to defer to human experts
- **Prompt Engineering**: Uses role-playing and chain-of-thought for better responses

### Results:
- Significantly reduced hallucinations compared to baseline GPT-4
- More concise and accurate responses
- Appropriate escalation for uncertain queries

### Next Steps:
1. **Expand the Knowledge Base**: Add more domain-specific documents
2. **Fine-tune Thresholds**: Adjust confidence thresholds based on your use case
3. **Production Deployment**: Integrate with your existing systems
4. **Monitor Performance**: Track metrics over time

### Citation:
If you use this framework in your research or production systems, please cite:

```bibtex
@article{hiriyanna2025multilayered,
    title={Multi-Layered Framework for LLM Hallucination Mitigation in High-Stakes Applications: A Tutorial},
    author={Hiriyanna, Sachin and Zhao, Wenbing},
    journal={Computers},
    publisher={MDPI},
    year={2025}
}
```

For questions or support, contact:
- Sachin Hiriyanna: sachinh@ieee.org
- Wenbing Zhao: wenbing@ieee.org