# Introduction to RAG (Retrieval Augmented Generation)

Retrieval Augmented Generation (RAG) combines the power of LLMs with external knowledge retrieval. This allows the model to answer questions about data it wasn't trained on (like your private documents).

## The Flow
1. **Chunking:** Break documents into smaller pieces.
2. **Embedding:** Convert chunks into vector representations (numbers).
3. **Retrieval:** Find the most similar chunks to a user's question.
4. **Generation:** Pass the retrieved context + question to the LLM.

In [None]:
import math

# 1. THE KNOWLEDGE BASE (Mock Data)
# In a real app, this would be your PDF/Database content
documents = [
    "RAG stands for Retrieval Augmented Generation.",
    "Fine-tuning updates the model weights, while RAG adds context at inference time.",
    "Vector databases are commonly used to store embeddings for RAG systems.",
    "Prompt engineering focuses on optimizing instructions sent to the model."
]

## 2 & 3. Embedding and Retrieval (Simplified)

In production, you'd use OpenAI embeddings (`text-embedding-3-small`) or HuggingFace models. Here, we'll use a simple keyword overlap to demonstrate the *concept* of finding relevant chunks.

In [None]:
def simple_retrieve(query, docs):
    query_terms = set(query.lower().split())
    scores = []
    
    for doc in docs:
        doc_terms = set(doc.lower().split())
        # Simple Jaccard similarity
        intersection = query_terms.intersection(doc_terms)
        score = len(intersection) / len(query_terms.union(doc_terms))
        scores.append((score, doc))
    
    # Sort by score desc
    scores.sort(key=lambda x: x[0], reverse=True)
    return scores[0][1] # Return best match

query = "How is RAG different from fine-tuning?"
best_context = simple_retrieve(query, documents)

print(f"Query: {query}")
print(f"Retrieved Context: {best_context}")

## 4. Generation

Now we combine the context and question into a prompt for the LLM.

In [None]:
def generate_rag_prompt(query, context):
    return f"""
Context Information:
{context}

Question: {query}

Instructions: Answer the question using ONLY the provided context information. 
If the answer is not in the context, say "I don't know."
"""

final_prompt = generate_rag_prompt(query, best_context)
print("--- Final Prompt Sent to LLM ---")
print(final_prompt)

---
## Next Steps

For a production-grade implementation including proper evaluation metrics (Precision, Recall, Drift), check out our **Core RAG Evaluation Project** in this repository:

`projects/core/01_rag_evaluation_pipeline`