# Week 3: Advanced LangChain & RAG (Retrieval-Augmented Generation)

## üìö Session Overview

**Duration:** 2 hours  
**Week:** 3  
**Instructor-Led Session**

---

## üéØ Learning Objectives

By the end of this session, you will be able to:
1. Understand what RAG is and why it's important
2. Work with embeddings and vector stores
3. Set up and use PGVector for semantic search
4. Process and chunk documents effectively
5. Build complete RAG applications with LangChain
6. Implement conversational RAG with memory

---

## üìã Prerequisites

- ‚úÖ Completed Week 1 & 2
- ‚úÖ Understanding of LangChain chains
- ‚úÖ PostgreSQL with PGVector extension installed
- ‚úÖ Docker running (for PGVector)

---

## ‚è±Ô∏è Estimated Time

- Setup & Introduction: 10 minutes
- Section 1 (RAG Introduction): 20 minutes
- Section 2 (Embeddings & Vectors): 25 minutes
- Section 3 (Document Processing): 20 minutes
- Section 4 (Building RAG Apps): 35 minutes
- Section 5 (Conversational RAG): 15 minutes
- Wrap-up & Q&A: 5 minutes

---

## üîß Setup

In [10]:
# Import required libraries
import os
from dotenv import load_dotenv

# LangChain imports
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import PGVector
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader, PyPDFLoader
from langchain_core.messages import BaseMessage, AIMessage, HumanMessage
from langchain_core.prompts import MessagesPlaceholder

# Standard imports
from typing import List
import textwrap

# Load environment variables
load_dotenv()

print("‚úÖ Setup complete!")

‚úÖ Setup complete!


---

# Section 1: Introduction to RAG (20 minutes)

## What is RAG?

**RAG (Retrieval-Augmented Generation)** combines the power of retrieval systems with generative AI.

### The Problem RAG Solves

**Without RAG:**
- ‚ùå LLMs have a knowledge cutoff date
- ‚ùå Can't access private/proprietary data
- ‚ùå Limited by training data
- ‚ùå May hallucinate facts

**With RAG:**
- ‚úÖ Access to current information
- ‚úÖ Use your own documents/data
- ‚úÖ Grounded, factual responses
- ‚úÖ Cite sources

---

## How RAG Works

```
User Question
     ‚Üì
1. Convert to embedding (vector)
     ‚Üì
2. Search vector database for similar documents
     ‚Üì
3. Retrieve top K most relevant documents
     ‚Üì
4. Combine question + retrieved docs in prompt
     ‚Üì
5. LLM generates answer based on context
     ‚Üì
Answer with sources
```

---

## Key Components

### 1. **Embeddings**
- Convert text to numerical vectors (arrays of numbers)
- Similar meaning = similar vectors
- Example: OpenAI's `text-embedding-3-small` creates 1536-dimensional vectors

### 2. **Vector Store**
- Database optimized for vector similarity search
- Examples: PGVector, Pinecone, Chroma, FAISS
- We'll use **PGVector** (PostgreSQL extension)

### 3. **Document Loaders**
- Load documents from various sources
- PDF, TXT, CSV, web pages, etc.

### 4. **Text Splitters**
- Break documents into smaller chunks
- Preserve semantic meaning
- Manage token limits

### 5. **Retriever**
- Searches vector store
- Returns most relevant chunks

---

## Use Cases

- **Customer Support:** Answer questions from documentation
- **Research Assistant:** Search through papers/articles
- **Internal Knowledge Base:** Company policies, procedures
- **Legal/Compliance:** Search contracts and regulations
- **Code Assistant:** Search codebase and documentation

---

---

# Section 2: Embeddings & Vector Stores (25 minutes)

## Understanding Embeddings

Embeddings are **numerical representations** of text that capture semantic meaning.

### How Embeddings Work

```python
Text: "The cat sits on the mat"
     ‚Üì (Embedding Model)
Vector: [0.23, -0.45, 0.67, ..., 0.12]  # 1536 numbers
```

**Similar texts ‚Üí Similar vectors:**
```
"cat on mat"     ‚Üí [0.23, -0.45, 0.67, ...]
"feline on rug"  ‚Üí [0.25, -0.43, 0.69, ...]  # Very similar!
"car in garage"  ‚Üí [0.89, 0.12, -0.34, ...]  # Very different!
```

---

## 2.1: Creating Embeddings

In [11]:
# Initialize embedding model
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

# Create embedding for a single text
text = "What is artificial intelligence?"
embedding_vector = embeddings.embed_query(text)

print("üìä Embedding Info:")
print(f"Text: {text}")
print(f"Vector dimension: {len(embedding_vector)}")
print(f"First 10 values: {embedding_vector[:10]}")
print(f"Vector type: {type(embedding_vector)}")

üìä Embedding Info:
Text: What is artificial intelligence?
Vector dimension: 1536
First 10 values: [0.006162669975310564, -0.01451562624424696, -0.03586096316576004, 0.0057395463809370995, 0.021922778338193893, -0.037055667489767075, -0.03623928874731064, 0.014804346486926079, -0.01701454445719719, 0.023894036188721657]
Vector type: <class 'list'>


### Embedding Multiple Documents

In [12]:
# Embed multiple documents
documents = [
    "Python is a programming language.",
    "Machine learning is a subset of AI.",
    "Neural networks are inspired by the human brain."
]

doc_embeddings = embeddings.embed_documents(documents)

print(f"üìö Embedded {len(doc_embeddings)} documents")
print(f"Each embedding has {len(doc_embeddings[0])} dimensions")

üìö Embedded 3 documents
Each embedding has 1536 dimensions


### Similarity Between Embeddings

We can measure similarity using **cosine similarity** or **dot product**.

In [13]:
import numpy as np

def cosine_similarity(vec1, vec2):
    """Calculate cosine similarity between two vectors."""
    dot_product = np.dot(vec1, vec2)
    norm1 = np.linalg.norm(vec1)
    norm2 = np.linalg.norm(vec2)
    return dot_product / (norm1 * norm2)

# Compare similarities
texts = [
    "I love programming in Python",
    "Python is my favorite coding language",
    "I enjoy eating pizza"
]

vecs = [embeddings.embed_query(t) for t in texts]

print("üîç Similarity Scores:")
print(f"Text 1 vs Text 2 (similar): {cosine_similarity(vecs[0], vecs[1]):.4f}")
print(f"Text 1 vs Text 3 (different): {cosine_similarity(vecs[0], vecs[2]):.4f}")
print(f"Text 2 vs Text 3 (different): {cosine_similarity(vecs[1], vecs[2]):.4f}")

üîç Similarity Scores:
Text 1 vs Text 2 (similar): 0.6870
Text 1 vs Text 3 (different): 0.3085
Text 2 vs Text 3 (different): 0.3102


---

## 2.2: Setting Up PGVector

**PGVector** is a PostgreSQL extension for storing and searching vector embeddings.

### Why PGVector?

- ‚úÖ Built on reliable PostgreSQL
- ‚úÖ ACID compliance (transactions)
- ‚úÖ Can store vectors alongside regular data
- ‚úÖ Fast similarity search
- ‚úÖ Open source and free

---

### Initialize PGVector Connection

In [14]:
# Database connection string
# Format: postgresql://username:password@host:port/database
CONNECTION_STRING = os.getenv(
    "DATABASE_URL",
    "postgresql://postgres:postgres@localhost:5432/ai_agent_course"
)

# Collection name (like a table for this use case)
COLLECTION_NAME = "week3_demo"

print(f"‚úÖ Connection string configured")
print(f"üì¶ Collection name: {COLLECTION_NAME}")

‚úÖ Connection string configured
üì¶ Collection name: week3_demo


### Store Documents in PGVector

In [15]:
from langchain_core.documents import Document

# Create sample documents
sample_docs = [
    Document(
        page_content="Python is a high-level programming language known for its simplicity.",
        metadata={"source": "python_intro.txt", "category": "programming"}
    ),
    Document(
        page_content="Machine learning enables computers to learn from data without explicit programming.",
        metadata={"source": "ml_basics.txt", "category": "AI"}
    ),
    Document(
        page_content="LangChain is a framework for developing applications powered by language models.",
        metadata={"source": "langchain_docs.txt", "category": "framework"}
    ),
    Document(
        page_content="Vector databases store embeddings and enable semantic search capabilities.",
        metadata={"source": "vector_db.txt", "category": "database"}
    )
]

# Create vector store and add documents
vectorstore = PGVector.from_documents(
    documents=sample_docs,
    embedding=embeddings,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
)

print(f"‚úÖ Stored {len(sample_docs)} documents in PGVector")

  store = cls(
Exception ignored in: <function PGVector.__del__ at 0x111467e20>
Traceback (most recent call last):
  File "/Users/rusirubandara/Documents/code/wireapps/intern/ai-program/venv/lib/python3.11/site-packages/langchain_community/vectorstores/pgvector.py", line 368, in __del__
AttributeError: 'PGVector' object has no attribute '_bind'


Exception: Failed to create vector extension: (psycopg2.OperationalError) connection to server at "localhost" (::1), port 5432 failed: FATAL:  database "ai_agent_course" does not exist

(Background on this error at: https://sqlalche.me/e/20/e3q8)

### Search Vector Store (Similarity Search)

In [None]:
# Perform similarity search
query = "What is a framework for building AI applications?"

results = vectorstore.similarity_search(query, k=2)

print(f"üîç Query: {query}")
print(f"\nüìä Top {len(results)} Results:\n")

for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content}")
    print(f"   Source: {doc.metadata['source']}")
    print(f"   Category: {doc.metadata['category']}")
    print()

### Similarity Search with Scores

In [None]:
# Get similarity scores
results_with_scores = vectorstore.similarity_search_with_score(query, k=3)

print(f"üîç Query: {query}")
print(f"\nüìä Results with Similarity Scores:\n")

for doc, score in results_with_scores:
    print(f"Score: {score:.4f}")
    print(f"Content: {doc.page_content}")
    print(f"Source: {doc.metadata['source']}")
    print("-" * 60)

### Filter by Metadata

In [None]:
# Search with metadata filter
filtered_results = vectorstore.similarity_search(
    query="programming",
    k=5,
    filter={"category": "programming"}
)

print("üîç Filtered Search (category='programming'):")
for doc in filtered_results:
    print(f"- {doc.page_content}")
    print(f"  Category: {doc.metadata['category']}\n")

---

# Section 3: Document Processing (20 minutes)

## Why Document Processing Matters

**Challenges:**
- Documents are often too long for LLM context windows
- Need to break into meaningful chunks
- Must preserve context and relationships
- Different file formats require different handling

**Solution:**
- **Document Loaders:** Extract text from various formats
- **Text Splitters:** Intelligently chunk documents
- **Metadata:** Track source and context

---

## 3.1: Document Loaders

In [None]:
# Create a sample text file
sample_text = """
Introduction to Artificial Intelligence

Artificial Intelligence (AI) is the simulation of human intelligence by machines.
AI systems can perform tasks that typically require human intelligence, such as
visual perception, speech recognition, decision-making, and language translation.

Types of AI:
1. Narrow AI: Designed for specific tasks (e.g., image recognition)
2. General AI: Theoretical AI with human-like intelligence
3. Super AI: Hypothetical AI that surpasses human intelligence

Machine Learning is a subset of AI that enables systems to learn from data.
Deep Learning is a subset of Machine Learning using neural networks.
"""

# Save to file
with open("sample_ai_doc.txt", "w") as f:
    f.write(sample_text)

print("‚úÖ Sample document created")

In [None]:
# Load text document
loader = TextLoader("sample_ai_doc.txt")
documents = loader.load()

print(f"üìÑ Loaded {len(documents)} document(s)")
print(f"\nDocument content preview:")
print(documents[0].page_content[:200] + "...")
print(f"\nMetadata: {documents[0].metadata}")

---

## 3.2: Text Splitting Strategies

### Why Split Text?

1. **Token Limits:** LLMs have maximum context length
2. **Relevance:** Smaller chunks = more precise retrieval
3. **Performance:** Faster search with smaller chunks
4. **Cost:** Only send relevant context to LLM

### Key Parameters:

- **chunk_size:** Target size of each chunk (in characters)
- **chunk_overlap:** Overlap between chunks to preserve context
- **separators:** How to split (by sentence, paragraph, etc.)

---

### RecursiveCharacterTextSplitter

Tries to split on different separators in order of preference:
1. Double newlines (paragraphs)
2. Single newlines (lines)
3. Spaces (words)
4. Characters (as last resort)

In [None]:
# Create text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,      # Target chunk size
    chunk_overlap=50,    # Overlap between chunks
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

# Split the document
chunks = text_splitter.split_documents(documents)

print(f"üìÑ Original document split into {len(chunks)} chunks\n")

for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i} (length: {len(chunk.page_content)}):")
    print(chunk.page_content)
    print("-" * 60)

### Comparing Different Chunk Sizes

In [None]:
# Compare different chunk sizes
chunk_sizes = [100, 300, 500]

for size in chunk_sizes:
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=size,
        chunk_overlap=20
    )
    chunks = splitter.split_documents(documents)
    print(f"Chunk size {size}: {len(chunks)} chunks created")

### Adding Custom Metadata

In [None]:
# Add custom metadata to chunks
for i, chunk in enumerate(chunks):
    chunk.metadata["chunk_id"] = i
    chunk.metadata["chunk_size"] = len(chunk.page_content)
    chunk.metadata["document_name"] = "AI Introduction"

print("Enhanced metadata for first chunk:")
print(chunks[0].metadata)

---

# Section 4: Building RAG Applications (35 minutes)

Now let's put it all together to build a complete RAG system!

## 4.1: Complete RAG Pipeline

In [None]:
# Step 1: Load and split documents
loader = TextLoader("sample_ai_doc.txt")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50
)
splits = text_splitter.split_documents(documents)

print(f"üìÑ Loaded and split into {len(splits)} chunks")

# Step 2: Create vector store
vectorstore = PGVector.from_documents(
    documents=splits,
    embedding=embeddings,
    collection_name="rag_demo",
    connection_string=CONNECTION_STRING,
)

print("‚úÖ Vector store created")

# Step 3: Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}  # Retrieve top 3 chunks
)

print("‚úÖ Retriever configured")

## 4.2: Test Retrieval

In [None]:
# Test the retriever
query = "What are the types of AI?"
retrieved_docs = retriever.get_relevant_documents(query)

print(f"üîç Query: {query}")
print(f"\nüìö Retrieved {len(retrieved_docs)} relevant chunks:\n")

for i, doc in enumerate(retrieved_docs, 1):
    print(f"Chunk {i}:")
    print(textwrap.fill(doc.page_content, width=80))
    print("-" * 80)

## 4.3: Build RAG Chain with LCEL

In [None]:
# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Create RAG prompt template
rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context:

{context}

Question: {question}

Answer: Provide a clear and concise answer based on the context above. 
If the answer cannot be found in the context, say "I don't have enough information to answer that."
""")

# Helper function to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build the RAG chain using LCEL
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

print("‚úÖ RAG chain created")

## 4.4: Query the RAG System

In [None]:
# Ask questions
questions = [
    "What is Artificial Intelligence?",
    "What are the three types of AI mentioned?",
    "How is Machine Learning related to AI?",
    "What is quantum computing?"  # Not in the context
]

for question in questions:
    print(f"\n‚ùì Question: {question}")
    answer = rag_chain.invoke(question)
    print(f"üí° Answer: {answer}")
    print("-" * 80)

## 4.5: RAG with Source Citations

In [None]:
# Enhanced RAG chain that returns sources
def rag_with_sources(question: str):
    """RAG that returns answer with source documents."""
    
    # Retrieve documents
    docs = retriever.get_relevant_documents(question)
    
    # Format context
    context = format_docs(docs)
    
    # Get answer
    answer = rag_chain.invoke(question)
    
    return {
        "question": question,
        "answer": answer,
        "sources": docs
    }

# Test with sources
result = rag_with_sources("What is Deep Learning?")

print(f"‚ùì Question: {result['question']}")
print(f"\nüí° Answer: {result['answer']}")
print(f"\nüìö Sources:")
for i, doc in enumerate(result['sources'], 1):
    print(f"\n{i}. {doc.metadata.get('source', 'Unknown')}")
    print(f"   {textwrap.fill(doc.page_content, width=70, initial_indent='   ', subsequent_indent='   ')}")

---

# Section 5: Conversational RAG (15 minutes)

Add memory to create a conversational RAG system that remembers context.

## 5.1: RAG with Conversation History

In [None]:
from langchain_core.prompts import MessagesPlaceholder
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Keep the answer concise.

{context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

conversational_rag = create_retrieval_chain(
    history_aware_retriever, question_answer_chain
)

print("‚úÖ Conversational RAG chain created")

## 5.2: Have a Conversation

In [None]:
questions_sequence = [
    "What is Artificial Intelligence?",
    "Can you tell me about the types?",
    "Which one is theoretical?",
    "How does Machine Learning fit into this?"
]

chat_history = []

print("üó£Ô∏è Starting Conversation:\n")
print("=" * 80)

for question in questions_sequence:
    print(f"\nüë§ User: {question}")
    
    result = conversational_rag.invoke({
        "input": question,
        "chat_history": chat_history
    })
    
    chat_history.append(HumanMessage(content=question))
    chat_history.append(AIMessage(content=result["answer"]))
    
    print(f"ü§ñ Assistant: {result['answer']}")
    print("-" * 80)

## 5.3: View Conversation History

In [None]:
print("üìú Conversation History:")
print("=" * 80)

for message in chat_history:
    role = "üë§ User" if isinstance(message, HumanMessage) else "ü§ñ Assistant"
    print(f"\n{role}: {message.content}")
    print("-" * 80)

print(f"\nTotal messages in history: {len(chat_history)}")
print(f"Number of exchanges: {len(chat_history) // 2}")

---

# üéØ Summary & Key Takeaways

## What We Learned:

### 1. **RAG Fundamentals**
- What RAG is and why it's powerful
- How RAG solves LLM limitations
- RAG architecture and workflow

### 2. **Embeddings & Vectors**
- Text embeddings capture semantic meaning
- Vector similarity measures relatedness
- OpenAI embeddings API usage

### 3. **PGVector**
- Setting up PostgreSQL with PGVector
- Storing and searching vectors
- Metadata filtering
- Similarity search with scores

### 4. **Document Processing**
- Document loaders for different formats
- Text splitting strategies
- Chunk size and overlap considerations
- Metadata management

### 5. **Building RAG Applications**
- Complete RAG pipeline with LCEL
- Retriever configuration
- Source citation
- Conversational RAG with memory

---

## üìù Next Steps:

### Exercises for This Week:

**Exercise 1 (Due Monday):** `02_exercise_knowledge_base.ipynb`
- Build personal knowledge base with RAG
- Upload and process multiple documents
- Implement Q&A with citations

**Exercise 2 (Due Friday):** `03_exercise_research_assistant.ipynb`
- Multi-document research assistant
- Metadata filtering and organization
- Advanced retrieval strategies

---

## ü§î Reflection Questions:

1. When should you use RAG vs fine-tuning?
2. How does chunk size affect retrieval quality?
3. What are the trade-offs of different text splitting strategies?
4. How can you improve RAG accuracy?

---

## üìö Additional Resources:

- [LangChain RAG Tutorial](https://python.langchain.com/docs/use_cases/question_answering/)
- [PGVector Documentation](https://github.com/pgvector/pgvector)
- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)

---

**Next Week:** Introduction to LangGraph! üöÄ