# RAG System with ScaleDown Prompt Optimization

This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) system using ScaleDown's prompt optimization features to improve response quality and reduce hallucinations.

## Features Covered:
- Document embedding and retrieval
- RAG prompt optimization with uncertainty quantification
- Chain-of-verification for fact-checking
- Expert persona optimization for domain-specific responses

In [None]:
# Install required packages
!pip install scaledown chromadb sentence-transformers openai

In [None]:
import os
import json
from typing import List, Dict, Any
import chromadb
from sentence_transformers import SentenceTransformer
import openai

# Import ScaleDown
from scaledown import ScaleDown, optimize_prompt, parse_optimizers
from scaledown.tools import tools

## Setup Configuration

In [None]:
# Configure API keys (set these as environment variables)
os.environ['OPENAI_API_KEY'] = 'your-openai-api-key'

# Initialize ScaleDown with optimization features
sd = ScaleDown(enable_optimization_styles=True)
sd.select_model('gpt-4')

# Initialize embedding model and vector database
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("knowledge_base")

## 1. Document Ingestion and Embedding

In [None]:
# Sample knowledge base documents
documents = [
    {
        "id": "doc1",
        "text": "Artificial Intelligence (AI) is a branch of computer science that aims to create intelligent machines. Machine Learning is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed.",
        "metadata": {"topic": "AI_basics", "source": "tech_overview"}
    },
    {
        "id": "doc2", 
        "text": "Deep Learning is a subset of machine learning that uses artificial neural networks with multiple layers. It has revolutionized fields like computer vision, natural language processing, and speech recognition.",
        "metadata": {"topic": "deep_learning", "source": "tech_overview"}
    },
    {
        "id": "doc3",
        "text": "Natural Language Processing (NLP) enables computers to understand, interpret, and generate human language. Modern NLP uses transformer architectures like BERT and GPT for various tasks.",
        "metadata": {"topic": "NLP", "source": "tech_overview"}
    },
    {
        "id": "doc4",
        "text": "Retrieval-Augmented Generation (RAG) combines information retrieval with text generation. It retrieves relevant documents from a knowledge base and uses them to generate more accurate and informed responses.",
        "metadata": {"topic": "RAG", "source": "advanced_ai"}
    }
]

def ingest_documents(documents: List[Dict]):
    """Embed and store documents in vector database"""
    texts = [doc["text"] for doc in documents]
    embeddings = embedding_model.encode(texts).tolist()
    
    collection.add(
        embeddings=embeddings,
        documents=texts,
        metadatas=[doc["metadata"] for doc in documents],
        ids=[doc["id"] for doc in documents]
    )
    
    print(f"✅ Ingested {len(documents)} documents")

# Ingest sample documents
ingest_documents(documents)

## 2. Document Retrieval System

In [None]:
def retrieve_documents(query: str, top_k: int = 3) -> List[Dict]:
    """Retrieve most relevant documents for a query"""
    query_embedding = embedding_model.encode([query]).tolist()
    
    results = collection.query(
        query_embeddings=query_embedding,
        n_results=top_k
    )
    
    retrieved_docs = []
    for i, doc_text in enumerate(results['documents'][0]):
        retrieved_docs.append({
            "text": doc_text,
            "metadata": results['metadatas'][0][i],
            "distance": results['distances'][0][i]
        })
    
    return retrieved_docs

# Test retrieval
test_query = "What is machine learning?"
retrieved = retrieve_documents(test_query)

print(f"Query: {test_query}")
print("\nRetrieved documents:")
for i, doc in enumerate(retrieved):
    print(f"{i+1}. {doc['text'][:100]}... (distance: {doc['distance']:.3f})")

## 3. Basic RAG Implementation

In [None]:
def basic_rag(query: str, top_k: int = 3) -> str:
    """Basic RAG without optimization"""
    # Retrieve relevant documents
    retrieved_docs = retrieve_documents(query, top_k)
    
    # Create context from retrieved documents
    context = "\n\n".join([doc["text"] for doc in retrieved_docs])
    
    # Basic prompt template
    prompt = f"""Context:
{context}

Question: {query}

Please answer the question based on the provided context."""
    
    # Use ScaleDown's tools API
    result = tools(llm='gpt-4', optimiser='none')
    llm_provider = result['llm_provider']
    
    response = llm_provider.call_llm(prompt, max_tokens=300)
    return response

# Test basic RAG
query = "How does deep learning relate to artificial intelligence?"
basic_response = basic_rag(query)
print(f"Query: {query}")
print(f"\nBasic RAG Response:\n{basic_response}")

## 4. Optimized RAG with ScaleDown

In [None]:
def optimized_rag(query: str, top_k: int = 3, optimization_style: str = "verified_expert") -> Dict[str, Any]:
    """Enhanced RAG with ScaleDown optimization"""
    # Retrieve relevant documents
    retrieved_docs = retrieve_documents(query, top_k)
    
    # Create enhanced context with metadata
    context_parts = []
    for i, doc in enumerate(retrieved_docs):
        source = doc['metadata'].get('source', 'unknown')
        topic = doc['metadata'].get('topic', 'general')
        context_parts.append(f"Source {i+1} (Topic: {topic}, Source: {source}):\n{doc['text']}")
    
    context = "\n\n".join(context_parts)
    
    # Enhanced prompt template
    base_prompt = f"""You are answering a question using provided context from a knowledge base.

CONTEXT:
{context}

QUESTION: {query}

Please provide a comprehensive answer based on the context. If the context doesn't contain sufficient information, clearly state what information is missing."""
    
    # Apply ScaleDown optimization
    if optimization_style == "custom":
        # Custom optimization pipeline
        optimizers = parse_optimizers('expert_persona,cot,uncertainty,cove')
        optimized_prompt = optimize_prompt(base_prompt, optimizers)
    else:
        # Use pre-built optimization style
        styles = sd.get_optimization_styles()
        if optimization_style in [style['name'] for style in styles]:
            result = sd.apply_optimization_style(base_prompt, optimization_style)
            optimized_prompt = result['optimized_prompt']
        else:
            optimized_prompt = base_prompt
    
    # Generate response with optimized prompt
    response_result = sd.optimize_and_call_llm(
        question=optimized_prompt,
        optimizers=[],  # Already optimized
        max_tokens=400
    )
    
    return {
        "query": query,
        "retrieved_docs": retrieved_docs,
        "base_prompt": base_prompt,
        "optimized_prompt": optimized_prompt,
        "response": response_result['llm_response'],
        "optimization_metrics": response_result.get('optimization_metrics', {})
    }

# Test optimized RAG with different styles
query = "How does deep learning relate to artificial intelligence?"

print("=" * 50)
print("OPTIMIZED RAG COMPARISON")
print("=" * 50)

# Test with verified expert style
result1 = optimized_rag(query, optimization_style="verified_expert")
print(f"\n📚 VERIFIED EXPERT STYLE:")
print(f"Response: {result1['response']}")

# Test with custom optimization pipeline
result2 = optimized_rag(query, optimization_style="custom")
print(f"\n🔧 CUSTOM OPTIMIZATION PIPELINE:")
print(f"Response: {result2['response']}")

## 5. Advanced RAG with Uncertainty and Verification

In [None]:
def advanced_rag_with_verification(query: str, confidence_threshold: float = 0.8) -> Dict[str, Any]:
    """Advanced RAG with uncertainty quantification and fact verification"""
    
    # Step 1: Retrieve documents
    retrieved_docs = retrieve_documents(query, top_k=3)
    context = "\n\n".join([doc["text"] for doc in retrieved_docs])
    
    # Step 2: Generate initial response with uncertainty quantification
    uncertainty_prompt = f"""Context: {context}

Question: {query}

Please answer the question and provide a confidence score (0-1) for your answer."""
    
    uncertainty_optimizers = parse_optimizers('expert_persona,uncertainty')
    optimized_uncertainty_prompt = optimize_prompt(uncertainty_prompt, uncertainty_optimizers)
    
    initial_result = sd.optimize_and_call_llm(
        question=optimized_uncertainty_prompt,
        optimizers=[],
        max_tokens=300
    )
    
    initial_response = initial_result['llm_response']
    
    # Step 3: If confidence is low, apply chain-of-verification
    verification_prompt = f"""Original Question: {query}
Context: {context}
Initial Answer: {initial_response}

Please verify the accuracy of the initial answer by checking it against the provided context. Identify any potential errors or unsupported claims."""
    
    verification_optimizers = parse_optimizers('expert_persona,cove')
    optimized_verification_prompt = optimize_prompt(verification_prompt, verification_optimizers)
    
    verification_result = sd.optimize_and_call_llm(
        question=optimized_verification_prompt,
        optimizers=[],
        max_tokens=300
    )
    
    # Step 4: Generate final verified response
    final_prompt = f"""Based on the verification analysis, provide a final, accurate answer to: {query}
    
Context: {context}
Initial Answer: {initial_response}
Verification Analysis: {verification_result['llm_response']}

Please provide the most accurate final answer."""
    
    final_optimizers = parse_optimizers('expert_persona,cot')
    optimized_final_prompt = optimize_prompt(final_prompt, final_optimizers)
    
    final_result = sd.optimize_and_call_llm(
        question=optimized_final_prompt,
        optimizers=[],
        max_tokens=400
    )
    
    return {
        "query": query,
        "retrieved_docs": retrieved_docs,
        "initial_response": initial_response,
        "verification_analysis": verification_result['llm_response'],
        "final_response": final_result['llm_response'],
        "processing_steps": [
            "Document Retrieval",
            "Initial Response with Uncertainty",
            "Chain-of-Verification",
            "Final Verified Response"
        ]
    }

# Test advanced RAG
complex_query = "What are the key differences between machine learning and deep learning, and how do they both relate to NLP?"
advanced_result = advanced_rag_with_verification(complex_query)

print("=" * 60)
print("ADVANCED RAG WITH VERIFICATION")
print("=" * 60)
print(f"\nQuery: {complex_query}")
print(f"\n🔍 Initial Response:\n{advanced_result['initial_response']}")
print(f"\n✅ Verification Analysis:\n{advanced_result['verification_analysis']}")
print(f"\n🎯 Final Verified Response:\n{advanced_result['final_response']}")

## 6. RAG Performance Comparison

In [None]:
def compare_rag_approaches(queries: List[str]) -> Dict[str, List[str]]:
    """Compare different RAG approaches"""
    results = {
        "queries": queries,
        "basic_rag": [],
        "optimized_rag": [],
        "advanced_rag": []
    }
    
    for query in queries:
        print(f"\nProcessing: {query}")
        
        # Basic RAG
        basic_resp = basic_rag(query)
        results["basic_rag"].append(basic_resp)
        
        # Optimized RAG
        opt_result = optimized_rag(query, optimization_style="verified_expert")
        results["optimized_rag"].append(opt_result['response'])
        
        # Advanced RAG
        adv_result = advanced_rag_with_verification(query)
        results["advanced_rag"].append(adv_result['final_response'])
    
    return results

# Test queries
test_queries = [
    "What is the relationship between AI and machine learning?",
    "How does RAG improve AI responses?",
    "What are the applications of deep learning in NLP?"
]

# Run comparison (uncomment to execute)
# comparison_results = compare_rag_approaches(test_queries)

print("\n📊 RAG COMPARISON FRAMEWORK READY")
print("Uncomment the comparison_results line to run full comparison")

## 7. Evaluation Metrics

In [None]:
def evaluate_rag_response(query: str, response: str, retrieved_docs: List[Dict]) -> Dict[str, float]:
    """Evaluate RAG response quality"""
    
    # Evaluation prompt using ScaleDown optimization
    eval_prompt = f"""Evaluate the following RAG system response on a scale of 1-10 for each criterion:

Query: {query}
Response: {response}
Retrieved Context: {[doc['text'][:100] + '...' for doc in retrieved_docs]}

Criteria:
1. Accuracy: How factually correct is the response?
2. Relevance: How well does it answer the specific question?
3. Completeness: Does it provide comprehensive coverage?
4. Coherence: Is the response well-structured and logical?
5. Context Usage: How well does it utilize the retrieved context?

Provide scores as: Accuracy: X, Relevance: Y, Completeness: Z, Coherence: A, Context Usage: B"""
    
    # Use expert persona for evaluation
    eval_optimizers = parse_optimizers('expert_persona')
    optimized_eval_prompt = optimize_prompt(eval_prompt, eval_optimizers)
    
    eval_result = sd.optimize_and_call_llm(
        question=optimized_eval_prompt,
        optimizers=[],
        max_tokens=200
    )
    
    # Parse evaluation scores (simplified parsing)
    eval_text = eval_result['llm_response']
    
    return {
        "evaluation_text": eval_text,
        "retrieval_count": len(retrieved_docs),
        "response_length": len(response),
        "avg_doc_distance": sum(doc['distance'] for doc in retrieved_docs) / len(retrieved_docs)
    }

# Example evaluation
sample_query = "What is machine learning?"
sample_docs = retrieve_documents(sample_query)
sample_response = "Machine learning is a subset of AI that enables computers to learn from experience."

evaluation = evaluate_rag_response(sample_query, sample_response, sample_docs)
print(f"\n📈 EVALUATION RESULTS:")
print(f"Query: {sample_query}")
print(f"Evaluation: {evaluation['evaluation_text']}")
print(f"Metrics: {evaluation['retrieval_count']} docs, avg distance: {evaluation['avg_doc_distance']:.3f}")

## Summary

This notebook demonstrated:

1. **Basic RAG Implementation**: Simple retrieval + generation
2. **ScaleDown Optimization**: Enhanced prompts with expert persona, CoT, uncertainty
3. **Advanced Verification**: Chain-of-verification for fact-checking
4. **Performance Comparison**: Side-by-side evaluation of approaches
5. **Evaluation Framework**: Metrics for response quality assessment

### Key Benefits of ScaleDown in RAG:
- **Reduced Hallucinations**: Uncertainty quantification and verification
- **Improved Accuracy**: Expert persona and domain-specific optimization
- **Better Reasoning**: Chain-of-thought for complex queries
- **Flexible Optimization**: Mix and match optimizers for specific use cases

### Next Steps:
- Implement custom optimizers for your domain
- Add more sophisticated evaluation metrics
- Scale to larger knowledge bases
- Integrate with production systems