# RAG System Demo

This notebook demonstrates the Retrieval-Augmented Generation (RAG) system including:
- Document ingestion and chunking
- Vector store integration
- Hybrid retrieval methods
- Question answering with context

## Setup

In [None]:
import sys

# Add src to path
sys.path.append("../../../src")

from ai import (
    ChunkingStrategy,
    Document,
    DocumentChunker,
    LLMFactory,
    RAGConfig,
    RAGPipeline,
    RetrievalMethod,
)

## Create Sample Documents

Let's create some sample documents about AI and machine learning:

In [None]:
# Sample documents about AI/ML topics
sample_docs = [
    Document(
        content="""
        Machine Learning Fundamentals

        Machine Learning is a subset of artificial intelligence that enables computers
        to learn and make decisions without being explicitly programmed. It uses
        algorithms to analyze data, identify patterns, and make predictions.

        Key types of machine learning include:
        1. Supervised Learning: Uses labeled data to train models
        2. Unsupervised Learning: Finds patterns in unlabeled data
        3. Reinforcement Learning: Learns through interaction and feedback

        Common algorithms include linear regression, decision trees, neural networks,
        support vector machines, and ensemble methods like random forests.
        """,
        metadata={
            "source": "ml_fundamentals.txt",
            "topic": "machine_learning",
            "difficulty": "beginner",
        },
    ),
    Document(
        content="""
        Deep Learning and Neural Networks

        Neural Networks are computing systems inspired by biological neural networks.
        They consist of layers of interconnected nodes (neurons) that process information
        through weighted connections. Deep learning uses multi-layer neural networks.

        Architecture components:
        - Input Layer: Receives data
        - Hidden Layers: Process information (deep networks have many)
        - Output Layer: Produces final results
        - Activation Functions: Introduce non-linearity (ReLU, Sigmoid, Tanh)

        Popular architectures include Convolutional Neural Networks (CNNs) for images,
        Recurrent Neural Networks (RNNs) for sequences, and Transformers for language.

        Training uses backpropagation to adjust weights based on error gradients.
        """,
        metadata={
            "source": "deep_learning.txt",
            "topic": "deep_learning",
            "difficulty": "intermediate",
        },
    ),
    Document(
        content="""
        Natural Language Processing (NLP)

        Natural Language Processing is a branch of AI that helps computers
        understand, interpret, and generate human language. It combines computational
        linguistics with machine learning and deep learning.

        Key NLP tasks include:
        - Text Classification: Categorizing text into predefined classes
        - Named Entity Recognition: Identifying entities like names, places, dates
        - Sentiment Analysis: Determining emotional tone of text
        - Machine Translation: Converting text between languages
        - Question Answering: Providing answers to natural language questions
        - Text Summarization: Creating concise summaries of longer texts

        Modern NLP heavily relies on transformer models like BERT, GPT, and T5.
        These models use attention mechanisms to understand context and relationships.
        """,
        metadata={
            "source": "nlp_overview.txt",
            "topic": "nlp",
            "difficulty": "intermediate",
        },
    ),
    Document(
        content="""
        Large Language Models and Generative AI

        Large Language Models (LLMs) are AI systems trained on vast amounts of text data
        to understand and generate human-like text. They represent a breakthrough in
        natural language understanding and generation.

        Key characteristics:
        - Scale: Billions or trillions of parameters
        - Emergent abilities: Capabilities that arise from scale
        - Few-shot learning: Can learn new tasks from few examples
        - Generalization: Apply knowledge to new domains

        Applications include:
        - Chatbots and virtual assistants
        - Code generation and programming assistance
        - Creative writing and content creation
        - Language translation and localization
        - Educational tutoring and explanation

        Popular models include GPT-4, Claude, PaLM, and LLaMA.
        """,
        metadata={
            "source": "llm_overview.txt",
            "topic": "llm",
            "difficulty": "advanced",
        },
    ),
]

print(f"Created {len(sample_docs)} sample documents:")
for doc in sample_docs:
    print(
        f"  • {doc.metadata['source']}: {doc.metadata['topic']} ({doc.metadata['difficulty']})"
    )

## Document Chunking Demo

Let's explore different chunking strategies:

In [None]:
# Test different chunking strategies
chunker_fixed = DocumentChunker(
    chunk_size=300, chunk_overlap=50, strategy=ChunkingStrategy.FIXED_SIZE
)
chunker_sentence = DocumentChunker(chunk_size=300, strategy=ChunkingStrategy.SENTENCE)

# Chunk the first document with different strategies
test_doc = sample_docs[0]

print("Original document length:", len(test_doc.content))
print()

# Fixed size chunking
fixed_chunks = await chunker_fixed.chunk_document(test_doc)
print(f"Fixed-size chunking: {len(fixed_chunks)} chunks")
for i, chunk in enumerate(fixed_chunks[:2]):  # Show first 2
    print(f"  Chunk {i + 1}: {len(chunk.content)} chars")
    print(f"    Preview: {chunk.content[:100]}...")

print()

# Sentence-based chunking
sentence_chunks = await chunker_sentence.chunk_document(test_doc)
print(f"Sentence-based chunking: {len(sentence_chunks)} chunks")
for i, chunk in enumerate(sentence_chunks[:2]):  # Show first 2
    print(f"  Chunk {i + 1}: {len(chunk.content)} chars")
    print(f"    Preview: {chunk.content[:100]}...")

## RAG Pipeline Setup

Now let's create and configure a RAG pipeline:

In [None]:
# Create RAG configuration
rag_config = RAGConfig(
    chunk_size=500,
    chunk_overlap=100,
    retrieval_method=RetrievalMethod.VECTOR,  # Start with vector retrieval
    top_k=3,
    similarity_threshold=0.1,
    vector_store_type="memory",  # Use in-memory store for demo
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",
)

print("RAG Configuration:")
print(
    f"  Chunking: {rag_config.chunk_size} chars with {rag_config.chunk_overlap} overlap"
)
print(f"  Retrieval: {rag_config.retrieval_method.value}")
print(f"  Top-K: {rag_config.top_k}")
print(f"  Vector Store: {rag_config.vector_store_type}")

In [None]:
# Create LLM provider (Note: requires API key in environment)
llm_provider = LLMFactory.create(
    "openai",
    {
        "model": "gpt-3.5-turbo",
        "temperature": 0.3,  # Lower temperature for more focused answers
        "max_tokens": 1000,
        "api_key": None,  # Should be set via OPENAI_API_KEY environment variable
    },
)

print("LLM Provider created: OpenAI GPT-3.5-turbo")

In [None]:
# Create RAG pipeline
rag_pipeline = RAGPipeline(rag_config, llm_provider)

print("✅ RAG Pipeline created successfully")

## Document Ingestion

Let's ingest our sample documents:

In [None]:
# Ingest documents
print("📥 Ingesting documents...")

await rag_pipeline.ingest_documents(sample_docs)

print("✅ Documents ingested successfully")
print(f"   Processed {len(sample_docs)} documents")
print("   Documents are now chunked and embedded in the vector store")

## Retrieval Testing

Let's test document retrieval with different queries:

In [None]:
# Test retrieval with various queries
test_queries = [
    "What is machine learning?",
    "How do neural networks work?",
    "What are some NLP applications?",
    "Tell me about large language models",
]

for query in test_queries:
    print(f"🔍 Query: {query}")

    # Retrieve relevant documents
    retrieved_docs = await rag_pipeline.retrieve(query, k=2)

    print(f"📚 Retrieved {len(retrieved_docs)} documents:")
    for i, doc in enumerate(retrieved_docs):
        score = f" (score: {doc.score:.3f})" if doc.score else ""
        source = doc.metadata.get("source", "unknown")
        topic = doc.metadata.get("topic", "unknown")
        print(f"  {i + 1}. {source} ({topic}){score}")
        print(f"     Preview: {doc.content[:150]}...")

    print("-" * 60)

## Question Answering

Now let's test the full RAG pipeline with question answering:

In [None]:
# Test question answering
questions = [
    "What are the three main types of machine learning?",
    "What are the key components of a neural network architecture?",
    "What NLP tasks are commonly performed?",
    "What makes large language models special compared to smaller models?",
]

for question in questions:
    print(f"❓ Question: {question}")
    print()

    try:
        # Get answer from RAG pipeline
        answer = await rag_pipeline.query(question)

        print("💡 Answer:")
        print(answer)

    except Exception as e:
        print(f"❌ Error: {e}")

    print("=" * 80)
    print()

## Hybrid Retrieval Demo

Let's test hybrid retrieval (combining vector and keyword search):

In [None]:
# Create hybrid RAG pipeline
hybrid_config = RAGConfig(
    chunk_size=500,
    chunk_overlap=100,
    retrieval_method=RetrievalMethod.HYBRID,  # Use hybrid retrieval
    top_k=3,
    vector_store_type="memory",
)

hybrid_pipeline = RAGPipeline(hybrid_config, llm_provider)

# Ingest documents into hybrid pipeline
await hybrid_pipeline.ingest_documents(sample_docs)

print("✅ Hybrid RAG pipeline created and documents ingested")

In [None]:
# Compare vector vs hybrid retrieval
test_query = "transformers attention mechanisms"

print(f"🔍 Comparing retrieval methods for: '{test_query}'")
print()

# Vector retrieval
vector_docs = await rag_pipeline.retrieve(test_query)
print("📊 Vector Retrieval Results:")
for i, doc in enumerate(vector_docs):
    source = doc.metadata.get("source", "unknown")
    score = f" (score: {doc.score:.3f})" if doc.score else ""
    print(f"  {i + 1}. {source}{score}")

print()

# Hybrid retrieval
hybrid_docs = await hybrid_pipeline.retrieve(test_query)
print("🔄 Hybrid Retrieval Results:")
for i, doc in enumerate(hybrid_docs):
    source = doc.metadata.get("source", "unknown")
    score = f" (score: {doc.score:.3f})" if doc.score else ""
    print(f"  {i + 1}. {source}{score}")

## RAG Evaluation

Let's evaluate our RAG system performance:

In [None]:
# Define evaluation queries with ground truth
eval_queries = [
    "What is supervised learning?",
    "How do transformers work in NLP?",
    "What are the layers in neural networks?",
]

# Ground truth: which documents should be retrieved for each query
ground_truth = [
    ["ml_fundamentals.txt"],  # Supervised learning is in ML fundamentals
    ["nlp_overview.txt"],  # Transformers mentioned in NLP overview
    ["deep_learning.txt"],  # Neural network layers in deep learning doc
]

# Run evaluation
try:
    eval_results = await rag_pipeline.evaluate_retrieval(eval_queries, ground_truth)

    print("📊 RAG Evaluation Results:")
    print(f"  Precision: {eval_results['precision']:.3f}")
    print(f"  Recall: {eval_results['recall']:.3f}")
    print(f"  F1 Score: {eval_results['f1']:.3f}")

except Exception as e:
    print(f"⚠️ Evaluation failed: {e}")
    print("This might be due to document IDs not being set properly")

## Summary

This notebook demonstrated the RAG system capabilities:

### ✅ What We Built:
1. **Document Processing**: Chunking strategies (fixed-size, sentence-based)
2. **Vector Storage**: In-memory vector store with embeddings
3. **Retrieval Methods**: Vector similarity and hybrid search
4. **Question Answering**: Context-aware answer generation
5. **Evaluation**: Performance metrics for retrieval quality

### 🚀 Next Steps:
- **Production Vector Stores**: ChromaDB, FAISS, Pinecone
- **Advanced Chunking**: Semantic chunking, document structure awareness
- **Reranking**: Add reranking models for better relevance
- **Query Expansion**: Enhance queries for better retrieval
- **Evaluation**: More comprehensive evaluation metrics

### 🔧 Configuration Options:
- Chunk size and overlap
- Retrieval methods (vector/keyword/hybrid)
- Similarity thresholds
- Top-K retrieval counts
- Embedding models