# LangChain RAG (Retrieval-Augmented Generation) - Complete Guide

## Overview

This notebook provides a comprehensive guide to building a **Retrieval-Augmented Generation (RAG)** system using LangChain. RAG is a powerful technique that combines the strengths of large language models (LLMs) with external knowledge retrieval.

### What is RAG?

RAG enhances LLM responses by:
1. **Retrieving** relevant documents from a knowledge base
2. **Augmenting** the prompt with retrieved context
3. **Generating** informed responses based on both the LLM's knowledge and retrieved documents

### RAG Architecture

```
User Query ‚Üí Embedding ‚Üí Vector Search ‚Üí Retrieved Docs ‚Üí LLM ‚Üí Response
                ‚Üì                          ‚Üì
         Vector Store ‚Üê Embeddings ‚Üê Document Chunks ‚Üê Documents
```

### What You'll Learn

- Document loading and preprocessing
- Text splitting strategies
- **OpenAI vs HuggingFace embeddings** (with comparisons)
- Vector store creation with FAISS
- Different retrieval strategies (Similarity vs MMR)
- RAG chain construction
- Performance evaluation and best practices

---

## 1. Setup and Installation

First, we'll install all required dependencies. This includes:
- **langchain**: Core framework
- **langchain-community**: Community integrations
- **langchain-openai**: OpenAI integrations
- **langchain-huggingface**: HuggingFace integrations
- **openai**: OpenAI API client
- **faiss-cpu**: Vector similarity search
- **tiktoken**: Token counting
- **sentence-transformers**: For HuggingFace embeddings

In [None]:
# Install required packages
%pip install -q langchain langchain-community langchain-openai langchain-huggingface openai faiss-cpu tiktoken sentence-transformers
%pip install -q beautifulsoup4 python-dotenv

: 

### Configure API Keys

**Security Best Practice**: Never hardcode API keys in your notebooks. Use environment variables or secure secret management.

For Google Colab, use the `userdata` feature. For local environments, use a `.env` file or environment variables.

In [None]:
import os

# Load environment variables from .env file
from dotenv import load_dotenv
load_dotenv()

# Load API keys from env file and set them explicitly in os.environ
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
HUGGINGFACE_API_KEY = os.getenv("HUGGINGFACE_API_KEY")

# Ensure the keys are set in the environment for libraries to use
if OPENAI_API_KEY:
    os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
if HUGGINGFACE_API_KEY:
    os.environ["HUGGINGFACE_API_KEY"] = HUGGINGFACE_API_KEY

In [None]:
# Verify that API keys are loaded correctly
print("API Keys Status:")
print(f"OPENAI_API_KEY: {'‚úì Loaded' if OPENAI_API_KEY else '‚úó Not loaded'}")
print(f"HUGGINGFACE_API_KEY: {'‚úì Loaded' if HUGGINGFACE_API_KEY else '‚úó Not loaded (optional)'}")

# Show first and last 4 characters for security
if OPENAI_API_KEY:
    print(f"\nOpenAI Key Preview: {OPENAI_API_KEY[:7]}...{OPENAI_API_KEY[-4:]}")
else:
    print("\n‚ö†Ô∏è  WARNING: OPENAI_API_KEY is not set!")
    
if HUGGINGFACE_API_KEY:
    print(f"HuggingFace Key Preview: {HUGGINGFACE_API_KEY[:7]}...{HUGGINGFACE_API_KEY[-4:]}")
else:
    print("‚ÑπÔ∏è  HuggingFace API key not set (not required for local embeddings)")

In [None]:
# Test the OpenAI API key directly
from openai import OpenAI

print("Testing OpenAI API key...")
try:
    client = OpenAI(api_key=OPENAI_API_KEY)
    # Try a simple API call
    response = client.models.list()
    print("‚úì API key is VALID! Connection successful.")
    print(f"  Available models: {len(list(response.data))} models found")
except Exception as e:
    print(f"‚úó API key is INVALID!")
    print(f"  Error: {str(e)}")
    print("\n‚ö†Ô∏è  Please verify your OpenAI API key:")
    print("  1. Go to https://platform.openai.com/api-keys")
    print("  2. Create a new API key")
    print("  3. Update the .env file with the new key")
    print("  4. Restart the kernel and rerun from the beginning")

---

## 2. Document Loading

The first step in building a RAG system is loading documents. LangChain supports various document loaders:
- **WebBaseLoader**: Load content from web pages
- **PyPDFLoader**: Load PDF files
- **TextLoader**: Load plain text files
- **DirectoryLoader**: Load multiple files from a directory

In this example, we'll load LangChain documentation pages about RAG and related topics.

In [None]:
from langchain_community.document_loaders import WebBaseLoader
import datetime

# Define URLs for LangChain documentation on RAG
urls = [
    "https://python.langchain.com/docs/use_cases/question_answering/",
    "https://python.langchain.com/docs/modules/data_connection/retrievers/",
    "https://python.langchain.com/docs/modules/model_io/llms/",
    "https://python.langchain.com/docs/use_cases/chatbots/"
]

# Initialize WebBaseLoader and load documents
print("Loading documents from web...")
loader = WebBaseLoader(urls)
docs = loader.load()
print(f"‚úì Loaded {len(docs)} documents")

# Add custom metadata to documents
# This is useful for filtering and source attribution
current_date = datetime.date.today().isoformat()
for doc in docs:
    doc.metadata['source_type'] = 'web_documentation'
    doc.metadata['process_date'] = current_date
    doc.metadata['domain'] = 'langchain'

print("‚úì Added custom metadata to all documents")

# Display first document info
if docs:
    print("\n--- First Document ---")
    print(f"Source: {docs[0].metadata.get('source', 'N/A')}")
    print(f"Content preview (first 300 chars):\n{docs[0].page_content[:300]}...")
    print(f"\nMetadata: {docs[0].metadata}")

---

## 3. Text Splitting Strategies

Large documents must be split into smaller chunks for effective retrieval. The key parameters are:

- **chunk_size**: Maximum number of characters per chunk
- **chunk_overlap**: Number of overlapping characters between chunks

### Why Overlap Matters

Overlap ensures that context isn't lost at chunk boundaries. For example, if a sentence is split between two chunks, overlap helps preserve its meaning.

### Strategy Comparison

We'll compare two strategies:
1. **Strategy A**: chunk_size=1000, chunk_overlap=200 (better for longer context)
2. **Strategy B**: chunk_size=500, chunk_overlap=100 (better for precise retrieval)

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Strategy A: Larger chunks with more overlap
print("=== Strategy A: Larger Chunks ===")
text_splitter_a = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
chunks_a = text_splitter_a.split_documents(docs)
print(f"Created {len(chunks_a)} chunks with chunk_size=1000, chunk_overlap=200")

# Strategy B: Smaller chunks with less overlap
print("\n=== Strategy B: Smaller Chunks ===")
text_splitter_b = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)
chunks_b = text_splitter_b.split_documents(docs)
print(f"Created {len(chunks_b)} chunks with chunk_size=500, chunk_overlap=100")

# Compare strategies
print("\n=== Comparison ===")
print(f"Strategy A: {len(chunks_a)} chunks (fewer, longer chunks)")
print(f"Strategy B: {len(chunks_b)} chunks (more, shorter chunks)")
print(f"Ratio: {len(chunks_b) / len(chunks_a):.2f}x more chunks with Strategy B")

# Display sample chunks
print("\n--- Strategy A - Sample Chunk ---")
print(f"Length: {len(chunks_a[0].page_content)} chars")
print(f"Content: {chunks_a[0].page_content[:300]}...")

print("\n--- Strategy B - Sample Chunk ---")
print(f"Length: {len(chunks_b[0].page_content)} chars")
print(f"Content: {chunks_b[0].page_content[:300]}...")

# We'll use Strategy A for the rest of the notebook
chunks = chunks_a
print(f"\n‚úì Using Strategy A ({len(chunks)} chunks) for subsequent examples")

---

## 4. Embeddings: OpenAI vs HuggingFace

Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts have similar vector representations.

### Comparison

| Feature | OpenAI Embeddings | HuggingFace Embeddings |
|---------|-------------------|------------------------|
| **Cost** | Pay per token | Free (local) |
| **Speed** | Fast (API) | Slower (local compute) |
| **Quality** | Very high | Good to high (model-dependent) |
| **Privacy** | Data sent to OpenAI | Data stays local |
| **Internet** | Required | Not required |

### When to Use Each

- **OpenAI**: Production systems, high quality needed, budget available
- **HuggingFace**: Privacy-sensitive data, cost constraints, offline operation

### 4.1 OpenAI Embeddings

OpenAI's `text-embedding-3-small` model provides high-quality embeddings with good performance.

In [None]:
from langchain_openai import OpenAIEmbeddings
import time

print("Initializing OpenAI Embeddings...")
openai_embeddings = OpenAIEmbeddings()
print("‚úì OpenAI Embeddings initialized")

# Test embedding generation
test_text = "What is retrieval-augmented generation?"
start_time = time.time()
test_embedding = openai_embeddings.embed_query(test_text)
elapsed = time.time() - start_time

print(f"\nTest Query: '{test_text}'")
print(f"Embedding dimension: {len(test_embedding)}")
print(f"Time taken: {elapsed:.3f}s")
print(f"First 5 values: {test_embedding[:5]}")

### 4.2 HuggingFace Embeddings

We'll use `sentence-transformers/all-MiniLM-L6-v2`, a popular open-source model that provides good quality embeddings with reasonable speed.

In [None]:
from langchain_huggingface import HuggingFaceEmbeddings

print("Initializing HuggingFace Embeddings...")
print("(First run will download the model - this may take a minute)\n")

hf_embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
print("‚úì HuggingFace Embeddings initialized")

# Test embedding generation
start_time = time.time()
test_embedding_hf = hf_embeddings.embed_query(test_text)
elapsed_hf = time.time() - start_time

print(f"\nTest Query: '{test_text}'")
print(f"Embedding dimension: {len(test_embedding_hf)}")
print(f"Time taken: {elapsed_hf:.3f}s")
print(f"First 5 values: {test_embedding_hf[:5]}")

### 4.3 Side-by-Side Comparison

Let's compare the two embedding approaches with the same test query.

In [None]:
import numpy as np

print("=== Embeddings Comparison ===")
print(f"\nTest Query: '{test_text}'\n")

comparison_data = [
    ["Feature", "OpenAI", "HuggingFace"],
    ["Dimension", len(test_embedding), len(test_embedding_hf)],
    ["Time (s)", f"{elapsed:.3f}", f"{elapsed_hf:.3f}"],
    ["Mean value", f"{np.mean(test_embedding):.4f}", f"{np.mean(test_embedding_hf):.4f}"],
    ["Std dev", f"{np.std(test_embedding):.4f}", f"{np.std(test_embedding_hf):.4f}"]
]

# Print comparison table
col_widths = [max(len(str(row[i])) for row in comparison_data) + 2 for i in range(3)]
for i, row in enumerate(comparison_data):
    print("".join(str(item).ljust(col_widths[j]) for j, item in enumerate(row)))
    if i == 0:
        print("-" * sum(col_widths))

print("\nüí° Key Takeaway:")
print("   - OpenAI: Higher dimension (1536), typically higher quality")
print("   - HuggingFace: Lower dimension (384), faster and free")

---

## 5. Vector Store Creation

Vector stores enable efficient similarity search over embeddings. We use **FAISS** (Facebook AI Similarity Search), which provides:
- Fast similarity search
- Efficient memory usage
- Support for large-scale datasets

We'll create two vector stores to compare both embedding approaches.

### 5.1 Vector Store with OpenAI Embeddings

In [None]:
from langchain_community.vectorstores import FAISS

print("Creating FAISS vector store with OpenAI embeddings...")
start_time = time.time()

vectorstore_openai = FAISS.from_documents(chunks, openai_embeddings)

elapsed = time.time() - start_time
print(f"‚úì Vector store created in {elapsed:.2f}s")
print(f"  - {len(chunks)} documents indexed")
print("  - Embedding dimension: 1536")

### 5.2 Vector Store with HuggingFace Embeddings

In [None]:
print("Creating FAISS vector store with HuggingFace embeddings...")
start_time = time.time()

vectorstore_hf = FAISS.from_documents(chunks, hf_embeddings)

elapsed = time.time() - start_time
print(f"‚úì Vector store created in {elapsed:.2f}s")
print(f"  - {len(chunks)} documents indexed")
print("  - Embedding dimension: 384")

### 5.3 Test Similarity Search

Let's test similarity search on both vector stores to see how they perform.

In [None]:
query = "How to build a RAG agent with LangChain?"

print(f"Query: '{query}'\n")
print("=" * 80)

# Test OpenAI vector store
print("\n--- OpenAI Embeddings Results ---")
results_openai = vectorstore_openai.similarity_search(query, k=3)
for i, doc in enumerate(results_openai, 1):
    print(f"\n{i}. Source: {doc.metadata.get('source', 'N/A')}")
    print(f"   Content: {doc.page_content[:200]}...")

# Test HuggingFace vector store
print("\n" + "=" * 80)
print("\n--- HuggingFace Embeddings Results ---")
results_hf = vectorstore_hf.similarity_search(query, k=3)
for i, doc in enumerate(results_hf, 1):
    print(f"\n{i}. Source: {doc.metadata.get('source', 'N/A')}")
    print(f"   Content: {doc.page_content[:200]}...")

print("\n" + "=" * 80)
print("\nüí° Notice how both retrievers find relevant documents, though ordering may differ.")

---

## 6. Retrieval Strategies

Different retrieval strategies optimize for different goals:

### Similarity Search
- Returns documents most similar to the query
- Simple and fast
- May return redundant documents

### MMR (Maximal Marginal Relevance)
- Balances relevance with diversity
- Reduces redundancy in results
- Particularly useful when documents contain similar information
- Controlled by `lambda_mult` parameter:
  - `lambda_mult=1.0`: Pure relevance (like similarity search)
  - `lambda_mult=0.0`: Pure diversity
  - `lambda_mult=0.5`: Balanced (recommended)

Let's compare both strategies using the OpenAI vector store.

### 6.1 Standard Similarity Retriever

In [None]:
# Create similarity-based retriever
similarity_retriever = vectorstore_openai.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}  # Retrieve top 4 documents
)

print("‚úì Similarity retriever created")
print("  - Search type: similarity")
print("  - Documents to retrieve: 4")

### 6.2 MMR (Maximal Marginal Relevance) Retriever

In [None]:
# Create MMR-based retriever
mmr_retriever = vectorstore_openai.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 4,              # Number of documents to return
        "fetch_k": 20,       # Number of documents to fetch before MMR
        "lambda_mult": 0.5   # Balance between relevance (1.0) and diversity (0.0)
    }
)

print("‚úì MMR retriever created")
print("  - Search type: mmr")
print("  - Documents to retrieve: 4")
print("  - Fetch size: 20")
print("  - Lambda (relevance/diversity): 0.5")

### 6.3 Compare Retrieval Strategies

In [None]:
query = "What are the steps to build a RAG agent with LangChain?"

print(f"Query: '{query}'\n")
print("=" * 80)

# Test similarity retriever
print("\n--- Similarity Search Results ---")
similarity_docs = similarity_retriever.get_relevant_documents(query)
for i, doc in enumerate(similarity_docs, 1):
    print(f"\n{i}. Source: {doc.metadata.get('source', 'N/A')}")
    print(f"   Content: {doc.page_content[:200]}...")

# Test MMR retriever
print("\n" + "=" * 80)
print("\n--- MMR Search Results ---")
mmr_docs = mmr_retriever.get_relevant_documents(query)
for i, doc in enumerate(mmr_docs, 1):
    print(f"\n{i}. Source: {doc.metadata.get('source', 'N/A')}")
    print(f"   Content: {doc.page_content[:200]}...")

print("\n" + "=" * 80)
print("\nüí° Key Observation:")
print("   MMR results should show more diversity in sources and content")
print("   while still maintaining relevance to the query.")

---

## 7. RAG Chain Construction

Now we'll build complete RAG chains that combine:
1. **LLM**: Generates answers
2. **Retriever**: Finds relevant documents
3. **Prompt**: Structures the input
4. **Chain**: Orchestrates the flow

The chain workflow:
```
User Query ‚Üí Retriever ‚Üí Retrieved Docs ‚Üí Prompt + LLM ‚Üí Final Answer
```

### 7.1 Initialize LLM

We'll use GPT-4o-mini for cost-effectiveness. For production, consider GPT-4 for higher quality.

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0  # Deterministic responses
)

print("‚úì ChatOpenAI LLM initialized")
print("  - Model: gpt-4o-mini")
print("  - Temperature: 0 (deterministic)")

### 7.2 Create Prompt Template

The prompt instructs the LLM on how to use the retrieved context.

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful AI assistant. Answer the user's question based on the context provided below.
    
If the context doesn't contain enough information to answer the question, say so clearly.
Always cite which parts of the context you used to formulate your answer.

Context:
{context}"""),
    ("user", "{input}"),
])

print("‚úì Prompt template created")

### 7.3 Build Document Chain and Retrieval Chains

We'll create:
1. A document chain that combines LLM with prompt
2. Retrieval chains for both similarity and MMR strategies

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

# Create document combining chain
document_chain = create_stuff_documents_chain(llm, prompt)
print("‚úì Document chain created")

# Create retrieval chain with similarity search
similarity_retrieval_chain = create_retrieval_chain(
    similarity_retriever, 
    document_chain
)
print("‚úì Similarity retrieval chain created")

# Create retrieval chain with MMR
mmr_retrieval_chain = create_retrieval_chain(
    mmr_retriever, 
    document_chain
)
print("‚úì MMR retrieval chain created")

print("\n‚úì RAG chains ready for inference")

---

## 8. RAG in Action: Comparison & Evaluation

Let's test our RAG chains with various queries and compare their performance.

### 8.1 Test Query with Similarity Retrieval

In [None]:
import pprint

user_query = "How to build a RAG agent with LangChain?"

print("=" * 80)
print(f"QUERY: {user_query}")
print("=" * 80)
print("\n--- SIMILARITY RETRIEVAL CHAIN ---\n")

# Invoke similarity retrieval chain
response_similarity = similarity_retrieval_chain.invoke({"input": user_query})

print("Retrieved Documents (Context):")
print("-" * 80)
for i, doc in enumerate(response_similarity["context"], 1):
    print(f"\n{i}. Source: {doc.metadata.get('source', 'N/A')}")
    print(f"   Content: {doc.page_content[:300]}...")

print("\n" + "=" * 80)
print("GENERATED ANSWER:")
print("=" * 80)
print(response_similarity["answer"])

### 8.2 Test Same Query with MMR Retrieval

In [None]:
print("=" * 80)
print(f"QUERY: {user_query}")
print("=" * 80)
print("\n--- MMR RETRIEVAL CHAIN ---\n")

# Invoke MMR retrieval chain
response_mmr = mmr_retrieval_chain.invoke({"input": user_query})

print("Retrieved Documents (Context):")
print("-" * 80)
for i, doc in enumerate(response_mmr["context"], 1):
    print(f"\n{i}. Source: {doc.metadata.get('source', 'N/A')}")
    print(f"   Content: {doc.page_content[:300]}...")

print("\n" + "=" * 80)
print("GENERATED ANSWER:")
print("=" * 80)
print(response_mmr["answer"])

### 8.3 Compare OpenAI vs HuggingFace Embeddings

Let's create a retrieval chain using the HuggingFace vector store and compare results.

In [None]:
# Create retriever from HuggingFace vector store
hf_retriever = vectorstore_hf.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}
)

# Create retrieval chain with HuggingFace embeddings
hf_retrieval_chain = create_retrieval_chain(hf_retriever, document_chain)

print("=" * 80)
print(f"QUERY: {user_query}")
print("=" * 80)
print("\n--- HUGGINGFACE EMBEDDINGS CHAIN ---\n")

# Invoke HuggingFace retrieval chain
response_hf = hf_retrieval_chain.invoke({"input": user_query})

print("Retrieved Documents (Context):")
print("-" * 80)
for i, doc in enumerate(response_hf["context"], 1):
    print(f"\n{i}. Source: {doc.metadata.get('source', 'N/A')}")
    print(f"   Content: {doc.page_content[:300]}...")

print("\n" + "=" * 80)
print("GENERATED ANSWER:")
print("=" * 80)
print(response_hf["answer"])

### 8.4 Summary Comparison

In [None]:
print("=" * 80)
print("COMPARISON SUMMARY")
print("=" * 80)

print("\n1. OpenAI Embeddings + Similarity Search")
print(f"   Answer length: {len(response_similarity['answer'])} chars")
print(f"   Documents retrieved: {len(response_similarity['context'])}")

print("\n2. OpenAI Embeddings + MMR Search")
print(f"   Answer length: {len(response_mmr['answer'])} chars")
print(f"   Documents retrieved: {len(response_mmr['context'])}")

print("\n3. HuggingFace Embeddings + Similarity Search")
print(f"   Answer length: {len(response_hf['answer'])} chars")
print(f"   Documents retrieved: {len(response_hf['context'])}")

print("\n" + "=" * 80)
print("\nüí° Key Insights:")
print("   - All approaches provide relevant answers")
print("   - MMR may provide more diverse context")
print("   - HuggingFace embeddings are competitive and free")
print("   - Choice depends on: budget, privacy needs, and quality requirements")

---

## 9. Advanced Features

### 9.1 Custom Metadata Filtering

We added custom metadata earlier. Now let's use it to filter results.

In [None]:
# Example: Filter by source domain
print("Sample metadata from our documents:")
sample_doc = chunks[0]
print(f"\nMetadata: {sample_doc.metadata}")

# Create a retriever with metadata filter
filtered_retriever = vectorstore_openai.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 4,
        "filter": {"source_type": "web_documentation"}  # Filter by our custom metadata
    }
)

print("\n‚úì Filtered retriever created")
print("  - Filter: source_type = 'web_documentation'")

# Test filtered retrieval
query = "What is a retriever in LangChain?"
filtered_docs = filtered_retriever.get_relevant_documents(query)

print(f"\nQuery: '{query}'")
print(f"Retrieved {len(filtered_docs)} documents with metadata filter\n")

for i, doc in enumerate(filtered_docs[:2], 1):
    print(f"{i}. Source: {doc.metadata.get('source', 'N/A')}")
    print(f"   Source Type: {doc.metadata.get('source_type', 'N/A')}")
    print(f"   Process Date: {doc.metadata.get('process_date', 'N/A')}")
    print(f"   Content: {doc.page_content[:150]}...\n")

### 9.2 Source Attribution

Show which sources were used to generate the answer.

In [None]:
query = "What are the key components of a RAG system?"

print(f"Query: {query}\n")
print("=" * 80)

response = similarity_retrieval_chain.invoke({"input": query})

print("\nANSWER:")
print("-" * 80)
print(response["answer"])

print("\n" + "=" * 80)
print("\nSOURCES USED:")
print("-" * 80)

# Extract unique sources
sources = set()
for doc in response["context"]:
    source = doc.metadata.get('source', 'Unknown')
    sources.add(source)

for i, source in enumerate(sorted(sources), 1):
    print(f"{i}. {source}")

print("\nüí° Always cite sources to build trust and enable verification!")

---

## 10. Best Practices & Common Pitfalls

### Best Practices

1. **Chunk Size Selection**
   - Smaller chunks (300-500): Better for precise information retrieval
   - Larger chunks (800-1200): Better for context-heavy questions
   - Always use overlap (100-200 chars) to preserve context

2. **Embedding Selection**
   - **OpenAI**: Best quality, suitable for production, requires API key
   - **HuggingFace**: Free, private, good for development and privacy-sensitive data
   - Test both with your specific use case

3. **Retrieval Strategy**
   - **Similarity**: Use for most cases, simple and effective
   - **MMR**: Use when you need diverse results and want to avoid redundancy
   - Experiment with `k` (number of documents) - typically 3-5 is good

4. **Prompt Engineering**
   - Always instruct the model to say when it doesn't know
   - Request source citations for transparency
   - Be specific about the expected format

5. **Metadata Management**
   - Add custom metadata for filtering and attribution
   - Include source URLs, dates, document types
   - Use metadata for access control in production

### Common Pitfalls to Avoid

#### 1. Undefined Variables
```python
# ‚ùå WRONG: Using retriever before defining it
chain = create_retrieval_chain(mmr_retriever, document_chain)
mmr_retriever = vectorstore.as_retriever(search_type="mmr")

# ‚úÖ CORRECT: Define before using
mmr_retriever = vectorstore.as_retriever(search_type="mmr")
chain = create_retrieval_chain(mmr_retriever, document_chain)
```

#### 2. Not Using Created Objects
```python
# ‚ùå WRONG: Creating embeddings but not using them
hf_embeddings = HuggingFaceEmbeddings()
vectorstore = FAISS.from_documents(chunks, openai_embeddings)  # Uses OpenAI instead!

# ‚úÖ CORRECT: Use what you create
hf_embeddings = HuggingFaceEmbeddings()
vectorstore = FAISS.from_documents(chunks, hf_embeddings)
```

#### 3. Incorrect Retriever Configuration
```python
# ‚ùå WRONG: Invalid search_type
retriever = vectorstore.as_retriever(search_type="mmr_search")  # Invalid type!

# ‚úÖ CORRECT: Valid search types
retriever = vectorstore.as_retriever(search_type="similarity")
# OR
retriever = vectorstore.as_retriever(search_type="mmr")
```

#### 4. Missing Dependencies
```python
# ‚ùå WRONG: Importing wrong package
from langchain_community.embeddings import HuggingFaceEmbeddings  # Deprecated!

# ‚úÖ CORRECT: Use the right package
from langchain_huggingface import HuggingFaceEmbeddings
```

#### 5. Not Handling Edge Cases
```python
# ‚ùå WRONG: No error handling
response = chain.invoke({"input": query})
print(response["answer"])

# ‚úÖ CORRECT: Handle potential errors
try:
    response = chain.invoke({"input": query})
    if "answer" in response:
        print(response["answer"])
    else:
        print("No answer generated")
except Exception as e:
    print(f"Error: {e}")
```

### Performance Optimization Tips

1. **Batch Processing**: Process multiple documents at once when creating embeddings
2. **Caching**: Save and load vector stores instead of recreating them
3. **Async Operations**: Use async methods for parallel processing
4. **Indexing**: For large datasets, consider more sophisticated indexing strategies

### Production Considerations

1. **Error Handling**: Add comprehensive error handling and logging
2. **Rate Limiting**: Respect API rate limits (especially OpenAI)
3. **Monitoring**: Track retrieval quality, latency, and costs
4. **Security**: Sanitize user inputs, manage API keys securely
5. **Versioning**: Track model versions and embeddings for reproducibility
6. **Evaluation**: Regularly evaluate RAG quality with test questions

---

## Conclusion

You now have a complete, production-ready RAG system! This notebook covered:

‚úÖ Document loading and preprocessing  
‚úÖ Text splitting strategies  
‚úÖ OpenAI vs HuggingFace embeddings comparison  
‚úÖ Vector store creation  
‚úÖ Similarity vs MMR retrieval strategies  
‚úÖ Complete RAG chain construction  
‚úÖ Advanced features (metadata, filtering, source attribution)  
‚úÖ Best practices and common pitfalls  

### Next Steps

1. **Experiment** with different chunk sizes and retrieval strategies
2. **Evaluate** performance on your specific use case
3. **Add** your own documents and data sources
4. **Enhance** with conversational memory for multi-turn dialogues
5. **Deploy** to production with proper monitoring

### Resources

- [LangChain Documentation](https://python.langchain.com/)
- [FAISS Documentation](https://github.com/facebookresearch/faiss)
- [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings)
- [Sentence Transformers](https://www.sbert.net/)

---

**Happy Building! üöÄ**