# LAB 1: Build a Complete RAG Chatbot

**Duration**: ~1 hour hands-on lab

## Prerequisites

âœ… **Completed Notebook 03 (LangChain Essentials)** - Document loaders, text splitters, vector stores, LCEL, RAG fundamentals  
âœ… OpenAI API key  

**Note**: This lab applies the concepts from Notebook 03 in a complete hands-on project.

---

## What You'll Build

A production-ready RAG (Retrieval-Augmented Generation) chatbot with:
- Document loading and chunking
- Vector storage with ChromaDB
- Semantic search
- Conversation memory
- Source citations

## Learning Objectives

By the end of this lab, you'll be able to:
- Load and process documents for RAG
- Generate and store embeddings in ChromaDB
- Build a retrieval chain with LangChain
- Add conversation memory for multi-turn conversations
- Implement source citation for transparency

---

In [None]:
# Uninstall existing langchain packages (Colab clean slate)
!pip uninstall -y langchain langchain-core langchain-community langchain-openai langchain-chroma langchain-text-splitters

# Install with compatible versions (let pip resolve dependencies)
!pip install -qU \
    openai \
    langchain \
    langchain-openai \
    langchain-chroma \
    langchain-community \
    langchain-text-splitters \
    pypdf \
    chromadb

# Show installed versions for verification
!pip list | grep langchain

print("\nâœ… Packages installed!")

In [None]:
# Setup API keys
import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

print("âœ… API key configured!")

---

## Step 1: Document Loading and Chunking

First, we'll load documents and split them into chunks for processing.

**Why chunking?**
- LLMs have token limits
- Smaller chunks = better semantic search
- Balance: too small = loss of context, too large = irrelevant matches

---

In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter  # âœ… Updated import (2025)

# Create a sample document
sample_doc = """Artificial Intelligence (AI) is transforming how we work and live. Machine Learning,
a subset of AI, enables computers to learn from data without explicit programming.

Deep Learning is a type of Machine Learning that uses neural networks with multiple layers.
It excels at tasks like image recognition, natural language processing, and speech recognition.

Natural Language Processing (NLP) is a field of AI focused on enabling computers to understand,
interpret, and generate human language. Applications include chatbots, translation, and sentiment analysis.

Retrieval-Augmented Generation (RAG) combines retrieval of relevant documents with text generation.
This approach helps LLMs provide more accurate and grounded responses by referencing external knowledge.

Vector databases store embeddings - numerical representations of text that capture semantic meaning.
Popular vector databases include ChromaDB, Pinecone, and Weaviate."""

# Save sample document
with open('sample_doc.txt', 'w') as f:
    f.write(sample_doc)

# Load document
loader = TextLoader('sample_doc.txt')
documents = loader.load()

print(f"Loaded {len(documents)} document(s)")
print(f"Document content preview: {documents[0].page_content[:100]}...")

In [None]:
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,        # Characters per chunk
    chunk_overlap=50,       # Overlap to maintain context
    length_function=len,
    separators=["\n\n", "\n", ". ", " ", ""]
)

chunks = text_splitter.split_documents(documents)

print(f"\nSplit into {len(chunks)} chunks")
print("\nChunk examples:")
for i, chunk in enumerate(chunks[:3]):
    print(f"\nChunk {i+1}: {chunk.page_content}")

---

## Step 2: Generate Embeddings & Store in ChromaDB

Now we'll convert chunks into embeddings and store them in a vector database.

**Embeddings**: Numerical representations that capture semantic meaning
**ChromaDB**: Fast, open-source vector database

---

In [None]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

# Initialize embeddings model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create vector store
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    collection_name="ai_knowledge_base",
    persist_directory="./chroma_db"
)

print("âœ… Created vector store with embeddings")
print(f"Total documents in vector store: {vectorstore._collection.count()}")

In [None]:
# Test semantic search
query = "What is deep learning?"
results = vectorstore.similarity_search(query, k=2)

print(f"Query: {query}\n")
print("Top 2 most relevant chunks:")
for i, doc in enumerate(results):
    print(f"\n{i+1}. {doc.page_content}")

---

## Step 3: Build RAG Pipeline

Create a complete RAG system that:
1. Retrieves relevant documents
2. Augments the prompt with context
3. Generates response with LLM

---

In [None]:
# Build RAG Pipeline using LCEL (modern LangChain 1.x)
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Create prompt template
template = """You are a helpful AI assistant. Use the following context to answer the question.
If you don't know the answer based on the context, say so.

Context: {context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

# Helper to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Create retrieval chain using LCEL
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

print("âœ… RAG pipeline created using LCEL!")

In [None]:
# Test the RAG pipeline
question = "What is the difference between Machine Learning and Deep Learning?"

# LCEL: invoke with question directly
answer = rag_chain.invoke(question)

print(f"Question: {question}\n")
print(f"Answer: {answer}")

### Add Citations

Let's add citations to show which sources informed the answer.

---

In [None]:
def format_with_citations(question):
    """Format response with source citations using LCEL"""
    # Get answer
    answer = rag_chain.invoke(question)
    
    # Get sources separately
    sources = retriever.invoke(question)
    
    # Add citations
    citation_text = "\n\nSources:"
    for i, doc in enumerate(sources):
        citation_text += f"\n[{i+1}] {doc.page_content[:100]}..."
    
    return answer + citation_text

# Test with citations
formatted_response = format_with_citations("What is RAG?")
print(formatted_response)

---

## Step 4: Add Conversation Memory

Enable multi-turn conversations with memory.

---

In [None]:
# Conversational RAG using LCEL
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.runnables import RunnablePassthrough

# Helper to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Contextualize question prompt
contextualize_q_system_prompt = """Given a chat history and the latest user question
which might reference context in the chat history, formulate a standalone question
which can be understood without the chat history. Do NOT answer the question,
just reformulate it if needed and otherwise return it as is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# Create contextualized question chain
contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()

# QA prompt with history
qa_system_prompt = """You are a helpful assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, say that you don't know.

Context: {context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# Build conversational RAG chain
conversational_rag_chain = (
    RunnablePassthrough.assign(
        context=lambda x: format_docs(
            retriever.invoke(
                contextualize_q_chain.invoke(x) if x.get("chat_history") else x["input"]
            )
        )
    )
    | qa_prompt
    | llm
    | StrOutputParser()
)

print("âœ… Conversational RAG chain created using LCEL!")

In [None]:
# Test multi-turn conversation
chat_history = []

# Turn 1
question1 = "What is NLP?"
response1 = conversational_rag_chain.invoke({
    "input": question1,
    "chat_history": chat_history
})
print(f"Question 1: {question1}")
print(f"Answer 1: {response1}\n")

# Update chat history
chat_history.extend([
    HumanMessage(content=question1),
    AIMessage(content=response1)
])

# Turn 2 (references previous context)
question2 = "What are some applications of it?"
response2 = conversational_rag_chain.invoke({
    "input": question2,
    "chat_history": chat_history
})
print(f"Question 2: {question2}")
print(f"Answer 2: {response2}")

print("\nâœ… Conversation memory working! The chatbot understood 'it' refers to NLP.")

---

## Challenge Exercise

Now it's your turn! Try these exercises:

1. **Add Your Own Documents**
   - Create a new text file with content on a topic you're interested in
   - Load and chunk it
   - Add to the vector store

2. **Experiment with Chunk Size**
   - Try different `chunk_size` values (100, 500, 1000)
   - Observe how it affects retrieval quality

3. **Adjust Retrieval Parameters**
   - Change `k` (number of documents retrieved)
   - Test with different queries

4. **Enhance Citations**
   - Modify `format_with_citations()` to include relevance scores
   - Add metadata (page numbers, document titles)

---

In [None]:
# Your code here!

# Example starting point:
# 1. Create new document
new_content = """
# Add your own content here
"""

# 2. Process and add to vector store
# text_splitter = ...
# new_chunks = ...
# vectorstore.add_documents(new_chunks)

# 3. Test with questions about your content

print("Add your code above to complete the challenge!")

---

## Summary & Key Takeaways

Congratulations! You've built a complete RAG chatbot. Here's what you accomplished:

### What You Built
âœ… **Document Loading**: Loaded and chunked text documents
âœ… **Vector Storage**: Created embeddings and stored in ChromaDB
âœ… **Semantic Search**: Retrieved relevant documents based on meaning
âœ… **RAG Pipeline**: Combined retrieval with LLM generation
âœ… **Citations**: Added source attribution for transparency
âœ… **Conversation Memory**: Enabled multi-turn conversations

### Key Concepts
- **RAG** = Retrieval (find relevant docs) + Augmented Generation (use context to answer)
- **Embeddings** capture semantic meaning as vectors
- **Vector databases** enable fast similarity search
- **Chunking** balances context vs specificity

### Production Considerations
- Use persistent storage for vector database
- Implement error handling for API calls
- Add rate limiting for costs
- Monitor retrieval quality
- Regular vector store updates

### Next Steps
- Try different embedding models (text-embedding-3-large for better quality)
- Integrate with a web framework (FastAPI, Streamlit)
- Add document metadata (dates, authors, sources)
- Implement semantic caching for faster responses
- Add guardrails (input validation, PII filtering)

### Resources
- [LangChain RAG Documentation](https://python.langchain.com/docs/use_cases/question_answering/)
- [ChromaDB Documentation](https://docs.trychroma.com/)
- [OpenAI Embeddings Guide](https://platform.openai.com/docs/guides/embeddings)

---

**Portfolio Tip**: This RAG chatbot is a great addition to your portfolio. Consider:
- Deploying it as a web app
- Adding a custom UI
- Training on domain-specific documents
- Publishing on GitHub with documentation

**Well done!** ðŸŽ‰

---