# Building a RAG System with LangChain and ChromaDB

## What is RAG?
**Retrieval-Augmented Generation (RAG)** combines the power of large language models with external knowledge retrieval to provide accurate, contextual answers.

## Key Components Used:
- **LangChain**: Framework for building LLM applications with modular components
- **ChromaDB**: Vector database for storing and searching document embeddings  
- **Google Gemini**: For generating embeddings and language model responses

## Why RAG?
- ✅ Reduces AI hallucinations by grounding responses in real data
- ✅ Provides up-to-date information beyond training data
- ✅ Enables source citation and fact verification
- ✅ Works with domain-specific knowledge bases

In [None]:
"""
Environment Setup
Load environment variables containing API keys and configuration settings.
"""
import os
from dotenv import load_dotenv
load_dotenv()  # Loads variables from .env file into environment

True

In [None]:
"""
Import Required Libraries

LangChain Components:
- RecursiveCharacterTextSplitter: Breaks large documents into smaller chunks intelligently
- TextLoader: Reads text files and converts them to LangChain Document objects
- GoogleGenerativeAIEmbeddings: Converts text to numerical vectors using Google's models
- Document: Standard LangChain document format with content and metadata
- Chroma: Vector database integration for storing and searching embeddings
"""

# Text processing and document handling
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.schema import Document

# Vector storage for embeddings
from langchain_community.vectorstores import Chroma

# Utility imports for data manipulation
import numpy as np
from typing import List

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
"""
RAG System Architecture Overview

This prints a comprehensive explanation of how our RAG system works step-by-step.
"""
print("""
RAG (Retrieval-Augmented Generation) Architecture:

📄 1. Document Loading: Load documents from various sources
✂️  2. Document Splitting: Break documents into smaller chunks
🔢 3. Embedding Generation: Convert chunks into vector representations
💾 4. Vector Storage: Store embeddings in ChromaDB
🔍 5. Query Processing: Convert user query to embedding
🎯 6. Similarity Search: Find relevant chunks from vector store
🔗 7. Context Augmentation: Combine retrieved chunks with query
🤖 8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge
""")


RAG (Retrieval-Augmented Generation) Architecture:

1. Document Loading: Load documents from various sources
2. Document Splitting: Break documents into smaller chunks
3. Embedding Generation: Convert chunks into vector representations
4. Vector Storage: Store embeddings in ChromaDB
5. Query Processing: Convert user query to embedding
6. Similarity Search: Find relevant chunks from vector store
7. Context Augmentation: Combine retrieved chunks with query
8. Response Generation: LLM generates answer using context

Benefits of RAG:
- Reduces hallucinations
- Provides up-to-date information
- Allows citing sources
- Works with domain-specific knowledge



## 1. Sample Data Creation
We'll create sample documents about AI/ML topics to demonstrate our RAG system.

In [None]:
"""
Sample Documents for RAG Demo

Creating three sample documents covering different AI/ML topics:
1. Machine Learning Fundamentals
2. Deep Learning and Neural Networks  
3. Natural Language Processing (NLP)

These will serve as our knowledge base for testing the RAG system.
"""
sample_docs = [
    """
    Machine Learning Fundamentals
    
    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are three main 
    types of machine learning: supervised learning, unsupervised learning, and reinforcement 
    learning. Supervised learning uses labeled data to train models, while unsupervised 
    learning finds patterns in unlabeled data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties.
    """,
    
    """
    Deep Learning and Neural Networks
    
    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of interconnected 
    nodes. Deep learning has revolutionized fields like computer vision, natural language 
    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly 
    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers 
    excel at sequential data processing.
    """,
    
    """
    Natural Language Processing (NLP)
    
    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, 
    machine translation, and question answering. Modern NLP heavily relies on transformer 
    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand 
    context and relationships between words in text.
    """
]

sample_docs

['\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.\n    ',
 '\n    Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective f

In [None]:
"""
Save Sample Documents to Temporary Directory

Creates a temporary directory and saves each sample document as a separate text file.
This simulates having a collection of documents to process.
"""
import tempfile
temp_dir = tempfile.mkdtemp()  # Creates a temporary directory

# Save each document as a separate file
for i, doc in enumerate(sample_docs):
    with open(f"{temp_dir}/doc_{i}.txt", "w") as f:
        f.write(doc)

print(f"Sample documents created in: {temp_dir}")

Sample document create in : /tmp/tmpp3_74czj


In [None]:
"""
Alternative: Save Documents to Current Directory

Creates document files in the current working directory for easier access.
This approach doesn't use temporary directories.
"""
import tempfile
temp_dir = tempfile.mkdtemp()

# Save documents in current directory with simple naming
for i, doc in enumerate(sample_docs):
    with open(f"doc_{i}.txt", "w") as f:
        f.write(doc)

In [None]:
# Display the temporary directory path for reference
temp_dir

'/tmp/tmp8za7zr7r'

## 2. Document Loading
Loading documents from files and converting them into LangChain Document objects that can be processed by our RAG system.

In [None]:
"""
Load Documents from Directory

DirectoryLoader: Loads all files matching a pattern from a directory
- glob="*.txt": Only load .txt files
- loader_cls=TextLoader: Use TextLoader for each file
- encoding='utf-8': Handle text encoding properly

Each loaded document becomes a Document object with content and metadata.
"""
from langchain_community.document_loaders import DirectoryLoader, TextLoader

# Load all text files from the data directory
loader = DirectoryLoader(
    "data",  # Directory containing our documents
    glob="*.txt",  # Pattern to match text files only
    loader_cls=TextLoader,  # Use TextLoader for each file
    loader_kwargs={'encoding': 'utf-8'}  # Ensure proper text encoding
)
documents = loader.load()

print(f"Loaded {len(documents)} documents")
print(f"\nFirst document preview:")
print(documents[0].page_content[:200] + "...")  # Show first 200 characters

Loaded 3 documents

First document preview:

    Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NLP include text classification, named entity r...


In [None]:
# Display the loaded documents structure
documents

[Document(metadata={'source': 'data/doc_2.txt'}, page_content='\n    Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.\n    '),
 Document(metadata={'source': 'data/doc_0.txt'}, page_content='\n    Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models,

## 3. Document Splitting
Breaking large documents into smaller, manageable chunks that fit within model context windows and improve retrieval accuracy.

In [None]:
"""
Split Documents into Chunks

RecursiveCharacterTextSplitter intelligently breaks documents:
- chunk_size=500: Maximum characters per chunk
- chunk_overlap=50: Characters shared between adjacent chunks (maintains context)
- length_function=len: How to measure chunk size
- separators=[" "]: Split on spaces first, then other separators

Overlap is crucial: it ensures important information isn't lost at chunk boundaries.
"""
# Initialize the text splitter with optimal parameters
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,  # Maximum size of each chunk
    chunk_overlap=50,  # Overlap between chunks to maintain context
    length_function=len,  # Use character count for measuring size
    separators=[" "]  # Hierarchy of separators (space first)
)

# Split all documents into chunks
chunks = text_splitter.split_documents(documents)

print(f"Created {len(chunks)} chunks from {len(documents)} documents")
print(f"\nChunk example:")
print(f"Content: {chunks[0].page_content[:150]}...")
print(f"Metadata: {chunks[0].metadata}")

Created 5 chunks from 3 documents

Chunk example:
Content: Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tasks in NL...
Metadata: {'source': 'data/doc_2.txt'}


In [None]:
# Display the chunks structure
chunks

[Document(metadata={'source': 'data/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.'),
 Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervise

In [None]:
"""
Set up Google API Key for Embeddings

Google Generative AI requires authentication via API key.
The key should be stored in environment variables for security.
"""
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")

In [None]:
"""
Initialize Google Embeddings Model

GoogleGenerativeAIEmbeddings converts text to numerical vectors:
- model="models/gemini-embedding-001": Specific Google embedding model
- These vectors capture semantic meaning of text
- Similar texts have similar vector representations
"""
sample_text = "Machine Learning is fascinating"
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
embeddings

GoogleGenerativeAIEmbeddings(client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7f17025c99d0>, model='models/gemini-embedding-001', task_type=None, google_api_key=SecretStr('**********'), credentials=None, client_options=None, transport=None, request_options=None)

In [None]:
"""
Generate Sample Embedding Vector

This demonstrates how text gets converted to numerical vectors.
The resulting vector has hundreds of dimensions representing semantic meaning.
"""
vector = embeddings.embed_query(sample_text)
vector  # This will show a long list of floating-point numbers

[-0.010484361089766026,
 0.032966647297143936,
 0.01858656480908394,
 -0.07652363926172256,
 -0.011906582862138748,
 0.004588339943438768,
 0.012288067489862442,
 0.033216264098882675,
 -0.016090691089630127,
 -0.00437378091737628,
 0.013594836927950382,
 -0.010538428090512753,
 0.015617091208696365,
 0.02469649910926819,
 0.12497039139270782,
 0.00884399376809597,
 -0.020016567781567574,
 -0.020702868700027466,
 -0.0031074886210262775,
 -0.01656494289636612,
 -0.0025832359679043293,
 0.003024312201887369,
 0.02058418095111847,
 -0.016372574493288994,
 -0.015559299848973751,
 -0.015602415427565575,
 0.05185103788971901,
 0.004539222456514835,
 0.018151642754673958,
 -0.006498632952570915,
 0.009674380533397198,
 0.030054643750190735,
 0.01871102675795555,
 0.020886773243546486,
 -0.01419147476553917,
 0.018303807824850082,
 0.017977919429540634,
 0.0021986868232488632,
 0.008418558165431023,
 0.014707104302942753,
 0.012362045235931873,
 0.002350432798266411,
 0.012088296934962273,
 -0

## 4. Initialize ChromaDB Vector Store
Creating a vector database to store document chunks as searchable embeddings. This enables fast similarity search for relevant content retrieval.

In [None]:
# Display chunks before storing them in the vector database
chunks

[Document(metadata={'source': 'data/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.'),
 Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervise

In [None]:
"""
Create ChromaDB Vector Store

Chroma.from_documents creates a vector database:
- documents=chunks: Store all our document chunks
- embedding=GoogleGenerativeAI: Use Google's embedding model
- persist_directory: Where to save the database (survives restarts)
- collection_name: Name for this specific collection

This process:
1. Converts each chunk to embeddings using Google's model
2. Stores embeddings and original text in ChromaDB
3. Creates searchable index for similarity queries
"""
persist_directory = "./chroma_db"

# Create and populate the vector store
vectorstore = Chroma.from_documents(
    documents=chunks,  # Our document chunks to store
    embedding=GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001"),
    persist_directory=persist_directory,  # Where to save the database
    collection_name="rag_collection"  # Name for this collection
)

print(f"Vector store created with {vectorstore._collection.count()} vectors")
print(f"Persisted to: {persist_directory}")

Vector store created with 5 vectors
Persisted to: ./chroma_db


## 5. Test Similarity Search
Testing our vector store by searching for documents similar to different queries. This demonstrates how retrieval works in RAG systems.

In [None]:
"""
Test Similarity Search - Machine Learning Query

similarity_search finds documents most similar to the query:
- Converts query to embedding vector
- Compares with stored document embeddings
- Returns k most similar chunks
"""
query = "What are the types of machine learning?"

# Find top 3 most similar document chunks
similar_docs = vectorstore.similarity_search(query, k=3)
similar_docs

[Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervised \n    learning finds patterns in unlabeled data. Reinforcement learning learns through'),
 Document(metadata={'source': 'data/doc_0.txt'}, page_content='data. Reinforcement learning learns through \n    interaction with an environment using rewards and penalties.'),
 Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of inter

In [None]:
"""Test NLP-related query"""
query = "what is NLP?"

similar_docs = vectorstore.similarity_search(query, k=3)
similar_docs

[Document(metadata={'source': 'data/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.'),
 Document(metadata={'source': 'data/doc_1.txt'}, page_content='Neural Networks (RNNs) and Transformers \n    excel at sequential data processing.'),
 Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has r

In [None]:
"""Test Deep Learning query"""
query = "what is Deep Learning?"

similar_docs = vectorstore.similarity_search(query, k=3)
similar_docs

[Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
 Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train

In [None]:
"""
Display Search Results in Human-Readable Format

This helper function makes it easier to understand what documents
were retrieved and why they might be relevant to the query.
"""
print(f"Query: {query}")
print(f"\nTop {len(similar_docs)} similar chunks:")
for i, doc in enumerate(similar_docs):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:200] + "...")  # First 200 characters
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")

Query: what is Deep Learning?

Top 3 similar chunks:

--- Chunk 1 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...
Source: data/doc_1.txt

--- Chunk 2 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...
Source: data/doc_0.txt

--- Chunk 3 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....
Source: data/doc_0.txt


## 6. Advanced Similarity Search with Scores
Understanding how similar each retrieved document is to your query helps evaluate retrieval quality.

In [None]:
"""
Similarity Search with Confidence Scores

similarity_search_with_score returns both documents and similarity scores:
- Lower scores = MORE similar (using L2/Euclidean distance)
- Helps understand retrieval confidence
- Useful for filtering low-quality matches
"""
results_scores = vectorstore.similarity_search_with_score(query, k=3)
results_scores

[(Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
  0.5264936089515686),
 (Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning u

## Understanding Similarity Scores

**ChromaDB Scoring System:**

**L2 Distance (Default):**
- **Lower scores = MORE similar** (closer in vector space)
- Score of 0 = identical vectors
- Typical range: 0 to 2+ (can be higher for very different content)

**Cosine Similarity (Alternative):**
- Higher scores = MORE similar  
- Range: -1 to 1 (1 being identical)

**Practical Guidelines:**
- Scores < 0.5: Very similar, high confidence
- Scores 0.5-1.0: Moderately similar, good for RAG
- Scores > 1.0: Less similar, may not be relevant

## 7. Building the Complete RAG System
Now we'll create a full RAG pipeline that combines retrieval with generation to answer questions using our knowledge base.

In [None]:
"""
Alternative: OpenAI LLM Setup (Commented Out)

This shows how you could use OpenAI's models instead of Google's:
- Requires OPENAI_API_KEY environment variable
- Uses ChatOpenAI for the language model component
"""
# from langchain_openai import ChatOpenAI
# 
# llm = ChatOpenAI(
#     model_name="gpt-3.5-turbo"
# )

In [None]:
"""
Initialize Google's Language Model

ChatGoogleGenerativeAI provides the generative capabilities:
- model="gemini-2.0-flash": Latest Gemini model
- temperature=0: Deterministic responses (less creative, more factual)
- max_tokens=None: No limit on response length
- timeout/max_retries: Handle API reliability
"""
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,  # More factual, less creative responses
    max_tokens=None,  # No response length limit
    timeout=None,
    max_retries=2,
)

In [None]:
"""Test the Language Model

Quick test to ensure the LLM is working properly before integrating 
it into our RAG pipeline.
"""
test_response = llm.invoke("What is Large Language Models")
test_response

AIMessage(content='Large Language Models (LLMs) are a type of artificial intelligence (AI) model that are trained on massive amounts of text data to understand and generate human-like text. They are designed to predict the next word in a sequence, given the preceding words. By learning patterns and relationships in the data, they can perform a wide range of natural language processing (NLP) tasks.\n\nHere\'s a breakdown of key aspects:\n\n**Key Characteristics:**\n\n*   **Large Scale:** The "large" in LLM refers to the massive size of both the model (number of parameters) and the dataset it\'s trained on.  These models often have billions or even trillions of parameters.\n*   **Transformer Architecture:** Most modern LLMs are based on the transformer architecture, which is particularly well-suited for processing sequential data like text. Transformers use self-attention mechanisms to weigh the importance of different words in a sentence, allowing them to capture long-range dependencies

In [None]:
"""
Alternative LLM Initialization (Commented Out)

Shows another way to initialize models using init_chat_model:
- Supports multiple providers (OpenAI, Groq, etc.)
- Unified interface for different LLM providers
"""
# from langchain.chat_models.base import init_chat_model
# 
# llm = init_chat_model("openai:gpt-3.5-turbo")
# # llm = init_chat_model("groq:")
# llm

In [None]:
# Test alternative LLM setup
# llm.invoke("What is AI")

## 8. Modern RAG Chain Implementation
Creating a complete RAG pipeline using LangChain's latest chain abstractions for retrieval and generation.

In [None]:
"""
Import Modern RAG Chain Components

- create_retrieval_chain: Combines retriever + document processor
- ChatPromptTemplate: Structures prompts with variables
- create_stuff_documents_chain: Processes retrieved documents with LLM
"""
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

In [None]:
"""
Convert Vector Store to Retriever

as_retriever converts our vector store into a retriever component:
- search_kwargs={"k":3}: Retrieve top 3 relevant chunks per query
- This becomes the "retrieval" part of our RAG system
"""
# Convert vector store to retriever interface
retriever = vectorstore.as_retriever(
    search_kwarg={"k": 3}  # Retrieve top 3 relevant chunks
)
retriever

VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7f17025ca030>, search_kwargs={})

In [None]:
"""
Create RAG Prompt Template

This prompt template structures how the LLM uses retrieved context:
- {context}: Placeholder for retrieved document chunks
- {input}: Placeholder for user's question
- Instructions guide the LLM's behavior and response style
"""
from langchain_core.prompts import ChatPromptTemplate

system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

In [None]:
# Display the prompt structure
prompt

ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

## Understanding create_stuff_documents_chain

**What it does:**
`create_stuff_documents_chain` creates a processing chain that:
1. Takes retrieved documents from the retriever
2. "Stuffs" them into the prompt's `{context}` placeholder  
3. Sends the complete prompt to the LLM
4. Returns the LLM's response

**Why "stuff"?** It literally stuffs all retrieved documents into a single prompt context window.

In [None]:
"""
Create Document Processing Chain

create_stuff_documents_chain combines LLM + prompt to process documents:
- Takes retrieved documents
- Inserts them into prompt template
- Sends to LLM for processing
- Returns generated response
"""
from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)
document_chain

RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf you don't know the answer, just say that you don't know. \nUse three sentences maximum and keep the answer concise.\n\nContext: {context}"), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])
| ChatGoogleGenerativeAI(model='models/gemini-2.0-flash', google_api_key=SecretStr('**********'), temperature=0.0, max_retries=2, clie

## Chain Components Relationship

**Document Chain Flow:**
1. 📄 Retrieved documents come in
2. 📝 Documents get inserted into prompt template  
3. 🤖 Complete prompt goes to LLM
4. ✨ LLM generates contextual response
5. 📤 Response is returned to user

## Understanding create_retrieval_chain

**Complete RAG Pipeline:**
`create_retrieval_chain` combines:
- **Retriever**: Finds relevant documents from vector store
- **Document Chain**: Processes documents with LLM

**Result**: End-to-end RAG system that retrieves + generates answers automatically.

In [None]:
"""
Create Complete RAG Chain

create_retrieval_chain combines retriever + document_chain:
1. User asks question
2. Retriever finds relevant documents  
3. Document chain processes documents with LLM
4. System returns answer with sources

This is our complete RAG pipeline!
"""
from langchain.chains import create_retrieval_chain

rag_chain = create_retrieval_chain(retriever, document_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7f17025ca030>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. \nUse the following pieces of retrieved context to answer the question. \nIf 

In [None]:
"""Test the Complete RAG System"""
response = rag_chain.invoke({"input": "What is Deep Learning?"})

In [None]:
# Display full response structure (includes answer + context)
response

{'input': 'What is Deep Learning?',
 'context': [Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
  Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learnin

In [None]:
# Extract just the answer
response["answer"]

'Deep learning is a subset of machine learning based on artificial neural networks. These networks are inspired by the human brain and consist of layers of interconnected nodes. It has revolutionized fields like computer vision, natural language processing, and speech recognition.'

In [None]:
"""
Comprehensive RAG Query Function

This function demonstrates a complete RAG interaction:
1. Takes a question
2. Retrieves relevant context
3. Generates answer using LLM
4. Shows both answer and sources used
"""
def query_rag_modern(question):
    """Query the RAG system and display detailed results"""
    print(f"Question: {question}")
    print("-" * 50)
    
    # Use the complete RAG chain
    result = rag_chain.invoke({"input": question})
    
    # Display answer
    print(f"Answer: {result['answer']}")
    
    # Display retrieved context sources
    print("\nRetrieved Context:")
    for i, doc in enumerate(result['context']):
        print(f"\n--- Source {i+1} ---")
        print(doc.page_content[:200] + "...")
    
    return result

# Test multiple questions to demonstrate RAG capabilities
test_questions = [
    "What are the three types of machine learning?",
    "What is deep learning and how does it relate to neural networks?", 
    "What are CNNs best used for?"
]

for question in test_questions:
    result = query_rag_modern(question)
    print("\n" + "="*80 + "\n")

Question: What are the three types of machine learning?
--------------------------------------------------
Answer: The three main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning learns through interaction with an environment using rewards and penalties.

Retrieved Context:

--- Source 1 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--- Source 2 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 3 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and

## 9. Alternative RAG Implementation with LCEL
LangChain Expression Language (LCEL) provides a more flexible way to build RAG chains with custom logic and processing steps.

In [None]:
"""
LCEL Components for Custom RAG Chain

- StrOutputParser: Converts LLM output to plain string
- RunnablePassthrough: Passes input through unchanged  
- RunnableParallel: Runs multiple operations in parallel

LCEL allows more granular control over the RAG pipeline.
"""
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

In [None]:
"""
Custom Prompt Template for LCEL Chain

This prompt gives more detailed instructions for using context
and provides specific guidance for handling unknown information.
"""
custom_prompt = ChatPromptTemplate.from_template("""Use the following context to answer the question. 
If you don't know the answer based on the context, say you don't know.
Provide specific details from the context to support your answer.

Context:
{context}

Question: {question}

Answer:""")
custom_prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])

In [None]:
"""
Reinitialize Retriever for LCEL Chain

Ensuring we have a properly configured retriever component
for the custom LCEL implementation.
"""
retriever = vectorstore.as_retriever(
    search_kwarg={"k": 3}  # Retrieve top 3 relevant chunks
)
retriever

VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7f17025ca030>, search_kwargs={})

In [None]:
"""
Document Formatting Function

format_docs converts list of Document objects to plain text:
- Takes retrieved documents
- Extracts page_content from each
- Joins with double newlines for readability
- Returns formatted string for prompt context
"""
def format_docs(docs):
    """Convert retrieved documents to formatted text string"""
    return "\n\n".join(doc.page_content for doc in docs)

In [None]:
"""
Build RAG Chain Using LCEL

This chain uses LCEL syntax (| operators) to create a pipeline:
1. Parallel processing: context from retriever, question passthrough
2. format_docs processes retrieved documents  
3. Prompt template combines context + question
4. LLM generates response
5. StrOutputParser extracts plain text

RunnablePassthrough: Passes the input question unchanged
| operator: Chains components together in sequence
"""
rag_chain_lcel = (
    {  # Parallel processing of inputs
        "context": retriever | format_docs,  # Retrieve docs → format them
        "question": RunnablePassthrough()  # Pass question through unchanged
    }
    | custom_prompt  # Insert into prompt template
    | llm  # Generate response with LLM
    | StrOutputParser()  # Convert to plain string
)

rag_chain_lcel

{
  context: VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7f17025ca030>, search_kwargs={})
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="Use the following context to answer the question. \nIf you don't know the answer based on the context, say you don't know.\nProvide specific details from the context to support your answer.\n\nContext:\n{context}\n\nQuestion: {question}\n\nAnswer:"), additional_kwargs={})])
| ChatGoogleGenerativeAI(model='models/gemini-2.0-flash', google_api_key=SecretStr('**********'), temperature=0.0, max_retries=2, client=<google.ai.generativelanguage_v1beta.services.generative_service.client.Ge

In [None]:
"""Test LCEL RAG Chain"""
response = rag_chain_lcel.invoke("What is Deep Learning")
response

'Deep learning is a subset of machine learning based on artificial neural networks. These networks are inspired by the human brain and consist of layers of interconnected nodes.'

In [None]:
"""Test Direct Retriever Access"""
retriever.get_relevant_documents("What is Deep Learning")

  retriever.get_relevant_documents("What is Deep Learning")


[Document(metadata={'source': 'data/doc_1.txt'}, page_content='Deep Learning and Neural Networks\n\n    Deep learning is a subset of machine learning based on artificial neural networks. \n    These networks are inspired by the human brain and consist of layers of interconnected \n    nodes. Deep learning has revolutionized fields like computer vision, natural language \n    processing, and speech recognition. Convolutional Neural Networks (CNNs) are particularly \n    effective for image processing, while Recurrent Neural Networks (RNNs) and Transformers'),
 Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train

In [None]:
"""
LCEL RAG Query Function

Demonstrates querying with LCEL chain:
- Shows both answer and source documents
- Provides transparency into retrieval process
- Useful for debugging and understanding system behavior
"""
def query_rag_lcel(question):
    """Query RAG system using LCEL implementation"""
    print(f"Question: {question}")
    print("-" * 50)
    
    # Get answer using LCEL chain (pass string directly)
    answer = rag_chain_lcel.invoke(question)
    print(f"Answer: {answer}")
    
    # Get source documents separately for display
    docs = retriever.get_relevant_documents(question)
    print("\nSource Documents:")
    for i, doc in enumerate(docs):
        print(f"\n--- Source {i+1} ---")
        print(doc.page_content[:200] + "...")

In [None]:
"""Test LCEL Chain with Sample Questions"""
print("Testing LCEL Chain:")
query_rag_lcel("What are the key concepts in reinforcement learning?")

Testing LCEL Chain:
Question: What are the key concepts in reinforcement learning?
--------------------------------------------------
Answer: Based on the context, the key concept in reinforcement learning is that it "learns through interaction with an environment using rewards and penalties."

Source Documents:

--- Source 1 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 2 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--- Source 3 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...

--- Source 4 ---
Neural Networks (RNNs) and Transformers 
    excel at sequential data processing....


In [52]:
query_rag_lcel("What is machine learning?")

Question: What is machine learning?
--------------------------------------------------
Answer: Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.

Source Documents:

--- Source 1 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--- Source 2 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...

--- Source 3 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 4 ---
Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human language. 
    Key tas

In [None]:
query_rag_lcel("What is deep learning?")

Question: What is depe learning?
--------------------------------------------------
Answer: Deep learning is a subset of machine learning based on artificial neural networks. These networks are inspired by the human brain and consist of layers of interconnected nodes.

Source Documents:

--- Source 1 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 2 ---
Deep Learning and Neural Networks

    Deep learning is a subset of machine learning based on artificial neural networks. 
    These networks are inspired by the human brain and consist of layers of i...

--- Source 3 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...

--- Source 4 ---
Natural Language Processing (NLP)

    NLP is a field of AI that focuses on the interaction between computers and human

## 10. Dynamic Knowledge Base Updates
Demonstrating how to add new information to an existing vector store without rebuilding the entire system.

In [None]:
# Check current vector store status
vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x7f17025ca030>

In [None]:
"""
Create New Document Content

Adding detailed information about Reinforcement Learning
to expand our knowledge base and test dynamic updates.
"""
new_document = """
Reinforcement Learning in Detail

Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or penalties 
based on its actions and learns to maximize cumulative reward over time. Key concepts 
in RL include: states, actions, rewards, policies, and value functions. Popular RL 
algorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and 
Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), 
robotics, and autonomous systems.
"""

In [None]:
# Check existing chunks before adding new ones
chunks

[Document(metadata={'source': 'data/doc_2.txt'}, page_content='Natural Language Processing (NLP)\n\n    NLP is a field of AI that focuses on the interaction between computers and human language. \n    Key tasks in NLP include text classification, named entity recognition, sentiment analysis, \n    machine translation, and question answering. Modern NLP heavily relies on transformer \n    architectures like BERT, GPT, and T5. These models use attention mechanisms to understand \n    context and relationships between words in text.'),
 Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses labeled data to train models, while unsupervise

In [None]:
"""
Create Document Object for New Content

Converting raw text to LangChain Document format:
- page_content: The actual text content
- metadata: Additional information (source, topic, etc.)
"""
new_doc = Document(
    page_content=new_document,
    metadata={"source": "manual_addition", "topic": "reinforcement_learning"}
)

In [None]:
# Display the new document structure
new_doc

Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='\nReinforcement Learning in Detail\n\nReinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and \nActor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.\n')

In [None]:
"""
Split New Document into Chunks

Using the same text splitter to maintain consistency:
- Ensures new chunks have similar size to existing ones
- Maintains overlap for context preservation
"""
new_chunks = text_splitter.split_documents([new_doc])
new_chunks

[Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='Reinforcement Learning in Detail\n\nReinforcement learning (RL) is a type of machine learning where an agent learns to make \ndecisions by interacting with an environment. The agent receives rewards or penalties \nbased on its actions and learns to maximize cumulative reward over time. Key concepts \nin RL include: states, actions, rewards, policies, and value functions. Popular RL \nalgorithms include Q-learning, Deep Q-Networks (DQN), Policy Gradient methods, and \nActor-Critic methods. RL has been'),
 Document(metadata={'source': 'manual_addition', 'topic': 'reinforcement_learning'}, page_content='methods, and \nActor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), \nrobotics, and autonomous systems.')]

In [None]:
"""
Add New Chunks to Existing Vector Store

add_documents method:
1. Converts new chunks to embeddings
2. Stores them in existing ChromaDB collection  
3. Updates the searchable index
4. No need to rebuild entire vector store
"""
vectorstore.add_documents(new_chunks)

['e79bfafe-8688-4026-a63d-c6edd7f08aff',
 'bd468126-aed9-4395-b0a2-2eeb36ba25fa']

In [None]:
"""Display Updated Vector Store Statistics"""
print(f"Added {len(new_chunks)} new chunks to the vector store")
print(f"Total vectors now: {vectorstore._collection.count()}")

Added 2 new chunks to the vector store
Total vectors now: 5


In [None]:
"""
Test Updated Knowledge Base

Querying about reinforcement learning should now return
information from our newly added document.
"""
new_question = "What are the key concepts in reinforcement learning"
result = query_rag_lcel(new_question)
result

Question: What are the keys concepts in reinforcement learning
--------------------------------------------------
Answer: Based on the context, the key concepts in reinforcement learning are: states, actions, rewards, policies, and value functions.

Source Documents:

--- Source 1 ---
Reinforcement Learning in Detail

Reinforcement learning (RL) is a type of machine learning where an agent learns to make 
decisions by interacting with an environment. The agent receives rewards or p...

--- Source 2 ---
data. Reinforcement learning learns through 
    interaction with an environment using rewards and penalties....

--- Source 3 ---
methods, and 
Actor-Critic methods. RL has been successfully applied to game playing (like AlphaGo), 
robotics, and autonomous systems....

--- Source 4 ---
Machine Learning Fundamentals

    Machine learning is a subset of artificial intelligence that enables systems to learn 
    and improve from experience without being explicitly programmed. There are...


## 11. Advanced RAG: Conversational Memory

### Why Conversational Memory Matters?

**Problem:** Traditional RAG only considers the current query, missing conversational context.

**Example:**
- User: "Tell me about Python"  
- Bot: *explains Python programming*
- User: "What are its main libraries?" ← **"its" refers to Python, but RAG doesn't know this**

**Solution:** Conversational RAG maintains chat history and reformulates context-dependent questions.

### Key Components:
- **create_history_aware_retriever**: Makes retriever understand conversation context
- **MessagesPlaceholder**: Stores chat history in prompts  
- **HumanMessage/AIMessage**: Structured conversation format

In [None]:
"""
Import Conversational RAG Components

- create_history_aware_retriever: Adds chat history awareness to retrieval
- MessagesPlaceholder: Template placeholder for conversation history
- HumanMessage/AIMessage: Structured message types for chat history
"""
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage

In [None]:
"""
Create History-Aware Query Reformulation Prompt

This prompt teaches the LLM to:
1. Look at chat history + current question
2. Reformulate context-dependent questions into standalone queries
3. NOT answer the question, just make it self-contained

Example transformation:
"What are its types?" → "What are the types of machine learning?"
"""
contextualize_q_system_prompt = """Given a chat history and the latest user question 
which might reference context in the chat history, formulate a standalone question 
which can be understood without the chat history. Do NOT answer the question, 
just reformulate it if needed and otherwise return it as is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),  # Placeholder for conversation history
    ("human", "{input}"),  # Current user question
])

In [None]:
"""
Create History-Aware Retriever

create_history_aware_retriever enhances our retriever:
1. Takes chat history + current question
2. Uses LLM to reformulate question if needed  
3. Uses reformulated question for document retrieval
4. Returns relevant documents with full context understanding
"""
history_aware_retriever = create_history_aware_retriever(
    llm,  # LLM for query reformulation
    retriever,  # Our original document retriever
    contextualize_q_prompt  # Prompt for reformulation
)
history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7f17025ca030>, search_kwargs={}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(

In [None]:
"""
Create Conversational QA Chain

Building a complete conversational RAG system:
1. History-aware prompt includes chat history
2. Document chain processes retrieved documents
3. Retrieval chain combines everything together
"""
# Create prompt that includes chat history
qa_system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.

Context: {context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),  # Include conversation history
    ("human", "{input}"),
])

# Create document processing chain with history awareness
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

# Create complete conversational RAG chain
conversational_rag_chain = create_retrieval_chain(
    history_aware_retriever,  # History-aware retrieval
    question_answer_chain  # History-aware document processing
)
print("Conversational RAG chain created!")

Conversational RAG chain created!


In [None]:
"""
Test Conversational RAG - First Question

Starting a conversation about machine learning.
Notice how we initialize empty chat history.
"""
chat_history = []

# First question establishes context
result1 = conversational_rag_chain.invoke({
    "chat_history": chat_history,
    "input": "What is machine learning?"
})
print(f"Q: What is machine learning?")
print(f"A: {result1['answer']}")

Q: What is machine learning?
A: Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. There are three main types: supervised learning, unsupervised learning, and reinforcement learning. Reinforcement learning learns through interaction with an environment using rewards and penalties.


In [None]:
"""
Update Chat History

Adding first Q&A pair to chat history:
- HumanMessage: User's question
- AIMessage: Assistant's response

This history will be used to understand follow-up questions.
"""
chat_history.extend([
    HumanMessage(content="What is machine learning"),
    AIMessage(content=result1['answer'])
])

In [None]:
# Display current chat history structure
chat_history

[HumanMessage(content='What is machine learning', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. There are three main types: supervised learning, unsupervised learning, and reinforcement learning. Reinforcement learning learns through interaction with an environment using rewards and penalties.', additional_kwargs={}, response_metadata={})]

In [None]:
"""
Test Follow-up Question with Context

"What are its main types?" refers to machine learning types.
The conversational RAG system should:
1. Understand "its" refers to ML from previous question
2. Reformulate query to "What are the main types of machine learning?"
3. Retrieve relevant documents about ML types
4. Provide contextual answer
"""
# Follow-up question that depends on conversation context
result2 = conversational_rag_chain.invoke({
    "chat_history": chat_history,
    "input": "What are its main types?"  # "its" refers to ML from previous question
})
result2

{'chat_history': [HumanMessage(content='What is machine learning', additional_kwargs={}, response_metadata={}),
  AIMessage(content='Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. There are three main types: supervised learning, unsupervised learning, and reinforcement learning. Reinforcement learning learns through interaction with an environment using rewards and penalties.', additional_kwargs={}, response_metadata={})],
 'input': 'What are its main types?',
 'context': [Document(metadata={'source': 'data/doc_0.txt'}, page_content='Machine Learning Fundamentals\n\n    Machine learning is a subset of artificial intelligence that enables systems to learn \n    and improve from experience without being explicitly programmed. There are three main \n    types of machine learning: supervised learning, unsupervised learning, and reinforcement \n    learning. Supervised learning uses label

In [None]:
"""Display the follow-up answer"""
result2['answer']

'The main types of machine learning are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses labeled data, while unsupervised learning finds patterns in unlabeled data. Reinforcement learning learns through interaction with an environment using rewards and penalties.'

In [None]:
"""
Conversational RAG Complete!

This notebook demonstrated:
✅ Document loading and chunking
✅ Vector embeddings with Google Gemini  
✅ ChromaDB vector store setup
✅ Similarity search and retrieval
✅ Complete RAG pipeline with LangChain
✅ Alternative LCEL implementation
✅ Dynamic knowledge base updates
✅ Conversational memory for context-aware Q&A

Your RAG system can now:
- Answer questions using your documents
- Maintain conversation context
- Add new knowledge dynamically  
- Provide source citations
- Handle follow-up questions intelligently
"""