# Advanced RAG Techniques

**GenAI Foundation Training - Day 2**

---

## Prerequisites

**Before starting this notebook, you should have:**

‚úÖ **Completed Notebook 03 (LangChain Essentials)** - LCEL, document loaders, text splitters, vector stores, basic RAG pipeline  
‚úÖ Understanding of embeddings and similarity search  
‚úÖ Familiarity with ChromaDB basics  

**Note**: This notebook builds on the RAG fundamentals from Notebook 03. If you haven't completed it yet, please do so first.

---

## What You'll Learn

In this notebook, you'll learn **advanced RAG techniques** beyond the basics:

1. ‚úÖ **Hybrid Search** - Combining semantic and keyword search
2. ‚úÖ **Query Expansion** - Generating multiple search queries
3. ‚úÖ **Reranking** - Improving retrieval quality with rerankers
4. ‚úÖ **Contextual Compression** - Extracting relevant excerpts
5. ‚úÖ **Multi-Query Retrieval** - Parallel searches with different perspectives
6. ‚úÖ **Production Optimizations** - Caching, batching, streaming
7. ‚úÖ **Advanced Citations** - Relevance scores and metadata tracking

---

## What is Advanced RAG?

While basic RAG (from Notebook 03) works well for many use cases, production systems often need:

- **Better retrieval quality**: Not all relevant documents rank highest
- **Query understanding**: Handle complex or ambiguous questions
- **Context optimization**: Too much context ‚Üí costs, too little ‚Üí poor answers
- **Performance**: Reduce latency and API costs

### Advanced RAG Techniques

| Technique | Problem Solved | Example |
|-----------|----------------|---------|
| **Hybrid Search** | Pure semantic search misses exact matches | "GPT-4" vs "generative model" |
| **Query Expansion** | Single query misses related docs | "ML bias" ‚Üí ["algorithmic bias", "fairness in AI"] |
| **Reranking** | Top-K results may not be most relevant | Rerank with cross-encoder |
| **Compression** | Retrieved docs too long/irrelevant | Extract only relevant sentences |

---

## What We'll Build

By the end of this notebook, you'll have:

üéØ Hybrid search combining semantic + keyword  
üéØ Multi-query retrieval system  
üéØ Reranking pipeline for improved quality  
üéØ Production-optimized RAG with caching  

Let's dive into advanced RAG! üöÄ

---

# Section 1: What is LangChain?

Before diving into RAG, let's understand the framework we'll use.

## What is LangChain?

**LangChain** is a framework for building applications powered by large language models (LLMs).

### Why Use LangChain?

‚úÖ **Abstracts Complexity** - Handles boilerplate code for you  
‚úÖ **Reusable Components** - Pre-built loaders, splitters, chains  
‚úÖ **Production-Ready** - Battle-tested patterns  
‚úÖ **Framework Agnostic** - Works with OpenAI, Anthropic, Google, etc.  

## Key Concepts We'll Use Today

| Component | What It Does | Example |
|-----------|--------------|----------|
| **Document Loaders** | Load files (PDF, TXT, web) | PyPDFLoader, TextLoader |
| **Text Splitters** | Chunk documents intelligently | RecursiveCharacterTextSplitter |
| **Vector Stores** | Integrate with vector DBs | Chroma, FAISS |
| **Chains** | Connect components together | RetrievalChain |
| **Memory** | Maintain conversation history | ConversationBufferMemory |

## The LangChain RAG Flow

```
Document Loaders ‚Üí Text Splitters ‚Üí Embeddings ‚Üí Vector Stores ‚Üí Retrieval Chains ‚Üí Complete RAG
```

## Important Note

üìù **Today's Focus**: We'll learn **minimal LangChain** - just what's needed for RAG.

üîÆ **Later in Training**: We'll cover **LangChain/LangGraph advanced patterns** (custom chains, agents, multi-step workflows).

---

Let's start building! üõ†Ô∏è

---

# Section 2: Package Installation

## Installing Latest LangChain Packages (December 2025)

‚ö†Ô∏è **Important**: LangChain packages are now **modular**. We need separate packages for different integrations.

### Packages We'll Install:

| Package | Version | Purpose |
|---------|---------|----------|
| `langchain` | >=0.3.0 | Core framework (new retrieval chain APIs) |
| `langchain-openai` | >=0.2.0 | OpenAI integrations (ChatOpenAI, embeddings) |
| `langchain-chroma` | >=0.1.2 | ChromaDB integration (**separate package**) |
| `langchain-community` | >=0.3.0 | Community integrations (PyPDFLoader, FAISS) |
| `langchain-text-splitters` | >=0.3.0 | Text splitting (**separate package**) |
| `pypdf` | >=5.1.0 | PDF parsing backend |
| `chromadb` | >=0.4.0 | Vector database client |

Let's install them:

In [None]:
# Uninstall existing langchain packages (Colab clean slate)
!pip uninstall -y langchain langchain-core langchain-community langchain-openai langchain-chroma langchain-text-splitters

# Install with compatible versions (let pip resolve dependencies)
!pip install -qU \
    langchain \
    langchain-openai \
    langchain-chroma \
    langchain-community \
    langchain-text-splitters \
    pypdf \
    chromadb

# Show installed versions for verification
!pip list | grep langchain

print("\n‚úÖ All packages installed successfully!")

### Verify Installation

Let's verify that the packages are installed correctly:

In [None]:
import langchain
import langchain_openai
import langchain_chroma
import langchain_community
import langchain_text_splitters
import chromadb

print(f"‚úÖ LangChain version: {langchain.__version__}")
print(f"‚úÖ ChromaDB version: {chromadb.__version__}")
print("\nüéâ All imports successful! Ready to build RAG.")

### üîç Verify Imports

After installation, let's verify that the critical imports work correctly:

In [None]:
# Verify LCEL imports (modern LangChain 1.x approach)
try:
    from langchain_core.prompts import ChatPromptTemplate
    from langchain_core.output_parsers import StrOutputParser
    from langchain_core.runnables import RunnablePassthrough
    print("‚úÖ LCEL imports successful!")
    print("‚ÑπÔ∏è  Note: LangChain 1.x uses LCEL (pipe syntax) instead of legacy chains")
except ImportError as e:
    print(f"‚ùå Import failed: {e}")
    print("\nüîß Troubleshooting:")
    print("1. Restart Colab runtime: Runtime > Restart runtime")
    print("2. Re-run installation cell above")

### Setup API Keys

We'll need an OpenAI API key for this notebook:

In [None]:
import os
import getpass

# Set OpenAI API key
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key: ")

print("‚úÖ API key set successfully!")

---

### üìå 2025 LangChain Updates

**Important**: This notebook uses the latest 2025 LangChain APIs.

**Key Import Changes**:
- `langchain_community` - Community integrations (loaders, vector stores)
- `langchain_chroma` - ChromaDB integration (NEW package)
- `langchain_text_splitters` - Text splitting (NEW separate package)
- `create_retrieval_chain` - Replaces deprecated RetrievalQA

All imports below use current 2025 standards.

---

---

# Section 3: Document Loading with LangChain

The first step in RAG is loading documents. LangChain provides **Document Loaders** for this.

## What is a Document Loader?

A Document Loader:
- Reads files from various sources (PDF, TXT, web, databases)
- Extracts text content
- Preserves metadata (source, page numbers, etc.)

## Document Structure

Each loaded document has:
- `page_content`: The actual text
- `metadata`: Dictionary with source info (file path, page number, etc.)

## Common Document Loaders

| Loader | File Type | Use Case |
|--------|-----------|----------|
| `PyPDFLoader` | PDF | Research papers, reports |
| `TextLoader` | TXT | Plain text files |
| `DirectoryLoader` | Multiple files | Bulk loading |
| `WebBaseLoader` | Web pages | Scrape websites |

Let's load a sample document!

### Create a Sample Document

First, let's create a sample text file to work with:

In [None]:
# Create a sample document about Machine Learning
sample_content = """Machine Learning: A Comprehensive Guide

Introduction to Machine Learning
Machine learning is a subset of artificial intelligence (AI) that focuses on building systems that can learn from and make decisions based on data. Unlike traditional programming where rules are explicitly coded, machine learning algorithms learn patterns from data.

Types of Machine Learning
There are three main types of machine learning:

1. Supervised Learning: The algorithm learns from labeled data. Examples include classification and regression tasks. Common algorithms include linear regression, logistic regression, decision trees, and neural networks.

2. Unsupervised Learning: The algorithm finds patterns in unlabeled data. Examples include clustering and dimensionality reduction. Common algorithms include K-means clustering and principal component analysis (PCA).

3. Reinforcement Learning: The algorithm learns through trial and error by receiving rewards or penalties. This is commonly used in robotics, game playing, and autonomous systems.

Deep Learning
Deep learning is a subset of machine learning that uses neural networks with multiple layers (deep neural networks). It has revolutionized fields like computer vision, natural language processing, and speech recognition. Popular frameworks include TensorFlow, PyTorch, and Keras.

Applications of Machine Learning
Machine learning is used in various domains:
- Healthcare: Disease diagnosis, drug discovery
- Finance: Fraud detection, algorithmic trading
- E-commerce: Recommendation systems, demand forecasting
- Transportation: Autonomous vehicles, route optimization
- Natural Language Processing: Chatbots, translation, sentiment analysis

Challenges in Machine Learning
Despite its success, machine learning faces several challenges:
- Data quality and quantity requirements
- Model interpretability and explainability
- Bias and fairness concerns
- Computational resource requirements
- Overfitting and generalization issues

The Future of Machine Learning
The field continues to evolve with trends like AutoML, federated learning, and edge AI. As computing power increases and algorithms improve, machine learning will become even more integral to our daily lives.
"""

# Save to file
with open("ml_guide.txt", "w") as f:
    f.write(sample_content)

print("‚úÖ Sample document created: ml_guide.txt")
print(f"Document length: {len(sample_content)} characters")

### Load Document with TextLoader

Now let's load our sample document:

In [None]:
from langchain_community.document_loaders import TextLoader

# Load the document
loader = TextLoader("ml_guide.txt")
documents = loader.load()

print(f"‚úÖ Loaded {len(documents)} document(s)")
print(f"\nDocument structure:")
print(f"- page_content: {len(documents[0].page_content)} characters")
print(f"- metadata: {documents[0].metadata}")

print(f"\nFirst 300 characters:")
print(documents[0].page_content[:300] + "...")

### Understanding Document Metadata

Metadata is crucial for RAG because it enables:
- **Citations**: Show users where answers came from
- **Filtering**: Search only specific sources
- **Tracking**: Monitor which documents are most useful

In [None]:
# Inspect metadata
for doc in documents:
    print("Metadata:")
    for key, value in doc.metadata.items():
        print(f"  {key}: {value}")

### üìù Key Takeaways

‚úÖ Document loaders extract text and preserve metadata  
‚úÖ Use `langchain_community.document_loaders` for imports (2025)  
‚úÖ Each document has `page_content` (text) and `metadata` (source info)  
‚úÖ Metadata enables citations and filtering  

**Next**: We'll chunk these documents into smaller pieces for better retrieval! üìÑ‚û°Ô∏èüìÑüìÑüìÑ

---

# Section 4: Advanced Text Splitting with LangChain

You already learned chunking in the previous notebook. Now let's see how **LangChain** makes it even better!

## Why Use LangChain Text Splitters?

| Manual Chunking (Previous Notebook) | LangChain Splitters |
|-------------------------------------|---------------------|
| We wrote chunking logic ourselves | Pre-built, tested splitters |
| Basic fixed-size or sentence split | Intelligent recursive splitting |
| Manual edge case handling | Handles edge cases automatically |
| Good for learning | Production-ready |

## LangChain Text Splitters

### RecursiveCharacterTextSplitter (Recommended)

This splitter:
- Tries to split on paragraphs (`\n\n`) first
- Falls back to sentences (`. `)
- Then words (` `)
- Finally characters

This preserves semantic meaning better!

‚ö†Ô∏è **Import Change (2025)**: Text splitters are now in `langchain_text_splitters` package

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,  # Characters per chunk
    chunk_overlap=50,  # Overlap between chunks (10%)
    separators=["\n\n", "\n", ". ", " ", ""]  # Try these in order
)

# Split our documents
chunks = text_splitter.split_documents(documents)

print(f"‚úÖ Split {len(documents)} document(s) into {len(chunks)} chunks")
print(f"\nFirst chunk preview:")
print(chunks[0].page_content[:200] + "...")
print(f"\nChunk metadata: {chunks[0].metadata}")

### Why 512 Characters + 50 Overlap?

‚úÖ **Sweet spot**: 512 chars ‚âà 128 tokens (good balance)  
‚úÖ **Overlap**: Maintains context across chunks  
‚úÖ **Not too small**: Enough context for LLM  
‚úÖ **Not too large**: Precise retrieval  

### Compare with Manual Chunking

Remember from the previous notebook? We did this manually:

```python
# Manual chunking (what we did before)
def chunk_by_tokens(text, chunk_size=512, overlap=50):
    words = text.split()
    chunks = []
    for i in range(0, len(words), chunk_size - overlap):
        chunk = ' '.join(words[i:i + chunk_size])
        chunks.append(chunk)
    return chunks
```

**LangChain's advantage**: Handles edge cases, preserves metadata, smarter separators!

In [None]:
# Inspect chunk sizes
chunk_sizes = [len(chunk.page_content) for chunk in chunks]

print(f"Chunk size statistics:")
print(f"- Average: {sum(chunk_sizes) / len(chunk_sizes):.0f} characters")
print(f"- Min: {min(chunk_sizes)} characters")
print(f"- Max: {max(chunk_sizes)} characters")
print(f"\nAll chunks have metadata: {all(chunk.metadata for chunk in chunks)}")

### üìù Key Takeaways

‚úÖ LangChain splitters are **production-ready** versions of what we learned  
‚úÖ Use `langchain_text_splitters.RecursiveCharacterTextSplitter` (2025)  
‚úÖ 512 characters + 50 overlap is a good default  
‚úÖ Splitters preserve metadata automatically  

**Next**: We'll store these chunks in a vector database! üóÑÔ∏è

---

# Section 5: Vector Stores with LangChain

You already know ChromaDB. Now let's use LangChain's wrapper!

‚ö†Ô∏è **Import Change (2025)**: Chroma is now in `langchain_chroma` package

In [None]:
# ‚úÖ CORRECT imports (2025)
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="rag_demo"
)

print(f"‚úÖ Created vector store with {len(chunks)} chunks")

In [None]:
# Test search
results = vector_store.similarity_search("What is deep learning?", k=3)

for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content[:100]}...\n")

---

## Section 6: Retrieval Chains with LCEL

### üìå Modern LangChain 1.x: LCEL (Expression Language)

**This notebook uses LCEL - the modern LangChain approach.**

**What is LCEL?**
- ‚úÖ Pipe operator syntax: `retriever | format | prompt | llm | parser`
- ‚úÖ Composable: Chain components together naturally
- ‚úÖ Built-in streaming, async, and batch processing
- ‚úÖ Recommended for all new LangChain 1.x code

**Why LCEL instead of legacy chains?**
- ‚ùå `create_retrieval_chain` - removed in LangChain 1.0
- ‚ùå `ConversationalRetrievalChain` - removed in LangChain 1.0
- ‚úÖ LCEL is simpler, more powerful, and future-proof

**LCEL RAG Pattern**:
```python
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt | llm | StrOutputParser()
)
```

Let's build a RAG chain using LCEL!

In [None]:
# Build RAG chain using LCEL (LangChain Expression Language)
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Setup components
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Create prompt template
template = """Answer the question based on the following context:

Context: {context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

# Helper function to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build LCEL chain with retriever
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("‚úÖ RAG chain created using LCEL!")

In [None]:
# Test the RAG chain with LCEL
question = "What are the types of machine learning?"

# With LCEL, we invoke with the question directly (simpler!)
answer = rag_chain.invoke(question)

print(f"Question: {question}\n")
print(f"Answer: {answer}")
print("\n‚úÖ LCEL makes RAG simple and clean!")

---

# Section 7: Complete RAG Pipeline with LCEL

Let's create a reusable RAG pipeline class using LCEL:

In [None]:
# Imports (global scope - best practice)
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

class RAGPipeline:
    """Production-ready RAG pipeline using LCEL"""

    def __init__(self, file_path):
        # Load and chunk documents
        loader = TextLoader(file_path)
        documents = loader.load()

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)
        chunks = text_splitter.split_documents(documents)

        # Create vector store
        embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
        self.vector_store = Chroma.from_documents(chunks, embeddings)
        self.retriever = self.vector_store.as_retriever(search_kwargs={"k": 3})
        self.llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

        # Helper to format docs
        def format_docs(docs):
            return "\n\n".join(doc.page_content for doc in docs)

        # Build LCEL chain
        template = """Answer based on context:

Context: {context}

Question: {question}

Answer:"""

        prompt = ChatPromptTemplate.from_template(template)

        self.rag_chain = (
            {"context": self.retriever | format_docs, "question": RunnablePassthrough()}
            | prompt
            | self.llm
            | StrOutputParser()
        )

        print(f"‚úÖ RAG Pipeline ready with {len(chunks)} chunks (using LCEL)")

    def ask(self, question):
        """Ask a question and get answer with sources"""
        answer = self.rag_chain.invoke(question)
        # Get source docs separately for citations
        docs = self.retriever.invoke(question)
        return {"answer": answer, "sources": docs}

# Create pipeline
pipeline = RAGPipeline("ml_guide.txt")

In [None]:
# Test the pipeline
response = pipeline.ask("What are machine learning challenges?")

print("Answer:", response["answer"])
print(f"\n‚úÖ Retrieved {len(response['sources'])} source documents")

---

# Section 8: Citations & Source Attribution

In [None]:
# Citations with LCEL
def ask_with_citations(question):
    """Ask question and return answer with source citations"""
    # Get answer from chain
    answer = rag_chain.invoke(question)
    
    # Get source documents separately
    source_docs = retriever.invoke(question)
    
    # Format citations
    citations = []
    for i, doc in enumerate(source_docs, 1):
        source = doc.metadata.get("source", "Unknown")
        citations.append(f"[{i}] {source}: {doc.page_content[:100]}...")
    
    return f"{answer}\n\nSources:\n" + "\n".join(citations)

# Test with citations
result = ask_with_citations("What is deep learning?")
print(result)

---

# Section 9: Conversation Memory with LCEL

Build a conversational RAG system that remembers chat history:

In [None]:
# Conversational RAG using LCEL
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.messages import HumanMessage, AIMessage

# Helper to format docs
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Contextualize question prompt
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# Create contextualized question chain
contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()

# QA prompt with history
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, say that you don't know.

Context: {context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# Build conversational RAG chain with LCEL
conversational_rag_chain = (
    RunnablePassthrough.assign(
        context=lambda x: format_docs(
            retriever.invoke(
                contextualize_q_chain.invoke(x) if x.get("chat_history") else x["input"]
            )
        )
    )
    | qa_prompt
    | llm
    | StrOutputParser()
)

# Example usage
chat_history = []

# First question
response1 = conversational_rag_chain.invoke({
    "input": "What is supervised learning?",
    "chat_history": chat_history
})

chat_history.extend([
    HumanMessage(content="What is supervised learning?"),
    AIMessage(content=response1)
])

# Second question (references first)
response2 = conversational_rag_chain.invoke({
    "input": "What about unsupervised learning?",
    "chat_history": chat_history
})

print("Q1:", response1)
print("\nQ2:", response2)
print("\n‚úÖ Conversational RAG using LCEL!")

---

# Section 10: Summary & Next Steps

## What You Learned

‚úÖ LangChain basics for RAG
‚úÖ Latest 2025 APIs (create_retrieval_chain)
‚úÖ Complete RAG pipeline
‚úÖ Citations and memory

## Key Updates (2025)

| Old | New |
|-----|-----|
| `langchain.document_loaders` | `langchain_community.document_loaders` |
| `langchain.text_splitter` | `langchain_text_splitters` |
| `langchain.vectorstores` | `langchain_chroma` |
| `RetrievalQA` | `create_retrieval_chain` |

## Next Steps

üîÆ LangChain/LangGraph advanced patterns
üîÆ Function calling & agents
üîÆ Security & guardrails

**Congratulations! You've built a production-ready RAG system!** üéâ