# Chatbot via Data Retrieval with LangChain

## RAG Workflow Overview
```
Documents → Loading → Splitting → Embeddings → Vector Store → Retrieval → LLM → Response
```

## Structure:
1. **Document Loading** - Ingest from PDFs, web, YouTube, Notion
2. **Document Splitting** - Chunk documents with overlap and metadata preservation
3. **Vector Stores and Embeddings** - Convert text to vectors, store in Chroma
4. **Advanced Retrieval** - MMR, metadata filtering, compression, self-query
5. **Question Answering** - RetrievalQA with custom prompts and chain types
6. **Conversational Chat** - Memory-enabled chatbot with GUI

Each section includes multiple techniques and addresses common failure modes.

In [None]:
# Install all required packages
!pip install langchain langchain-community langchain-openai langchain-chroma \
             langchain-huggingface langchain-aws langchain-text-splitters \
             beautifulsoup4 chromadb sentence-transformers pypdf yt-dlp pydub \
             panel param docarray tiktoken lark-parser

## Setup and Configuration

Configure Azure OpenAI and AWS credentials for comprehensive embedding options.

In [None]:
import os
import datetime
from google.colab import userdata
from langchain_openai import AzureChatOpenAI
import numpy as np

# Set Azure OpenAI credentials
os.environ["AZURE_OPENAI_API_KEY"] = userdata.get('eduhkkey')
os.environ['AWS_ACCESS_KEY_ID'] = userdata.get('awsid')
os.environ['AWS_SECRET_ACCESS_KEY'] = userdata.get('awssecret')
os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'

# Configure LLM
llm = AzureChatOpenAI(
    azure_endpoint="https://aai02.eduhk.hk/openai/deployments/gpt-4o-mini/chat/completions?Hello=",
    api_version="2024-02-15-preview",
    deployment_name="gpt-4o-mini",
    temperature=0,
    streaming=False,
)

print(f"LLM configured: {llm.deployment_name}")
print(f"Base URL: {llm.client._client._base_url}")

## 1. Document Loading

### Comprehensive Loading from Multiple Sources

Document loading is the first step in RAG, involving:
- Converting raw data to Document objects
- Preserving metadata (source, page numbers, etc.)
- Handling multiple formats and sources

In [None]:
from langchain_community.document_loaders import (
    WebBaseLoader, PyPDFLoader, NotionDirectoryLoader, TextLoader
)
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers import OpenAIWhisperParser
from langchain_community.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader
import bs4

# 1.1 Web Loading with HTML filtering
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
web_loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
web_docs = web_loader.load()

print(f"Loaded {len(web_docs)} web documents")
print(f"First 200 chars: {web_docs[0].page_content[:200]}")
print(f"Metadata: {web_docs[0].metadata}")

In [None]:
# 1.2 PDF Loading with metadata preservation
# Create sample PDF documents for demonstration
sample_pdfs = [
    # You can add your own PDF paths here
    # "path/to/your/document1.pdf",
    # "path/to/your/document2.pdf",
]

pdf_docs = []
for pdf_path in sample_pdfs:
    if os.path.exists(pdf_path):
        loader = PyPDFLoader(pdf_path)
        docs = loader.load()
        pdf_docs.extend(docs)
        print(f"Loaded {len(docs)} pages from {pdf_path}")

# Demonstrate with web content if no PDFs available
if not pdf_docs:
    print("No PDFs found, using web content for demonstration")
    all_docs = web_docs
else:
    all_docs = pdf_docs + web_docs

print(f"Total documents loaded: {len(all_docs)}")

## 2. Document Splitting

### Advanced Text Splitting Strategies

Splitting is crucial for:
- Fitting within LLM context windows
- Maintaining semantic coherence
- Preserving document structure and metadata

In [None]:
from langchain_text_splitters import (
    RecursiveCharacterTextSplitter, 
    CharacterTextSplitter,
    TokenTextSplitter,
    MarkdownHeaderTextSplitter
)

# 2.1 Recursive Character Text Splitter (Recommended)
recursive_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True,  # Adds start index metadata
    separators=["\n\n", "\n", ". ", " ", ""]  # Hierarchical separators
)

# 2.2 Token-based splitter for precise token control
token_splitter = TokenTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

# 2.3 Structure-aware markdown splitter
markdown_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[
        ("#", "Header 1"),
        ("##", "Header 2"),
        ("###", "Header 3"),
    ]
)

# Split documents using recursive splitter
splits = recursive_splitter.split_documents(all_docs)

print(f"Original documents: {len(all_docs)}")
print(f"Split chunks: {len(splits)}")
print(f"Sample chunk metadata: {splits[0].metadata}")
print(f"Sample chunk content: {splits[0].page_content[:200]}...")

In [None]:
# 2.4 Demonstrate different splitting strategies
sample_text = """# Introduction to AI

Artificial Intelligence (AI) is transforming our world. It encompasses machine learning, deep learning, and natural language processing.

## Machine Learning

Machine learning enables computers to learn without explicit programming. Key algorithms include linear regression, decision trees, and neural networks.

### Supervised Learning

Supervised learning uses labeled training data to make predictions.
"""

# Compare splitting methods
char_splits = CharacterTextSplitter(chunk_size=100, chunk_overlap=20).split_text(sample_text)
recursive_splits = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20).split_text(sample_text)
markdown_splits = markdown_splitter.split_text(sample_text)

print("Character Splitter Results:")
for i, chunk in enumerate(char_splits):
    print(f"Chunk {i}: {chunk[:50]}...")

print("\nMarkdown Header Splitter Results:")
for i, doc in enumerate(markdown_splits):
    print(f"Chunk {i}: {doc.page_content[:50]}...")
    print(f"Metadata: {doc.metadata}")

## 3. Vector Stores and Embeddings

### Multiple Embedding Providers and Vector Storage

Embeddings convert text to vectors that capture semantic meaning. Different providers offer various capabilities and pricing models.

In [None]:
from langchain_aws import BedrockEmbeddings
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.documents import Document

# 3.1 Multiple Embedding Options

# AWS Bedrock Embeddings (requires AWS setup)
try:
    bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1")
    print("AWS Bedrock embeddings configured")
except Exception as e:
    print(f"AWS Bedrock not available: {e}")
    bedrock_embeddings = None

# Hugging Face Embeddings (free, local)
hf_embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)

print("Hugging Face embeddings loaded")

# 3.2 Demonstrate embedding similarity
sentences = [
    "I love machine learning and AI",
    "Artificial intelligence and ML are fascinating",
    "The weather is beautiful today"
]

embeddings_list = [hf_embeddings.embed_query(sent) for sent in sentences]

# Calculate similarity between embeddings
similarity_1_2 = np.dot(embeddings_list[0], embeddings_list[1])
similarity_1_3 = np.dot(embeddings_list[0], embeddings_list[2])

print(f"\nSimilarity between sentences 1 and 2 (related): {similarity_1_2:.4f}")
print(f"Similarity between sentences 1 and 3 (unrelated): {similarity_1_3:.4f}")

In [None]:
# 3.3 Create Vector Store with Chroma
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=hf_embeddings,
    persist_directory="./chroma_db"
)

print(f"Vector store created with {vectorstore._collection.count()} documents")

# 3.4 Add documents with rich metadata
enhanced_docs = [
    Document(
        page_content="Machine learning is a subset of AI focused on algorithms that learn from data.",
        metadata={"topic": "machine_learning", "difficulty": "beginner", "year": 2024}
    ),
    Document(
        page_content="Deep learning uses neural networks with multiple layers to model complex patterns.",
        metadata={"topic": "deep_learning", "difficulty": "advanced", "year": 2024}
    ),
    Document(
        page_content="Natural language processing enables computers to understand human language.",
        metadata={"topic": "nlp", "difficulty": "intermediate", "year": 2024}
    )
]

vectorstore.add_documents(enhanced_docs)
print(f"Added {len(enhanced_docs)} documents with enhanced metadata")

# 3.5 Basic similarity search
query = "What is artificial intelligence?"
basic_results = vectorstore.similarity_search(query, k=3)

print(f"\nBasic similarity search for: '{query}'")
for i, doc in enumerate(basic_results):
    print(f"Result {i+1}: {doc.page_content[:100]}...")
    print(f"Metadata: {doc.metadata}")
    print()

## 4. Advanced Retrieval Techniques

### Addressing Common Retrieval Problems

Basic similarity search has limitations:
- **Diversity**: Results may be too similar
- **Specificity**: Metadata filtering needed
- **Relevance**: Context compression required
- **Query Understanding**: Self-query for complex requests

In [None]:
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# 4.1 Maximum Marginal Relevance (MMR) - Balances relevance and diversity
mmr_retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 10, "lambda_mult": 0.5}
)

print("MMR Retrieval Results:")
mmr_results = mmr_retriever.invoke("machine learning algorithms")
for i, doc in enumerate(mmr_results):
    print(f"MMR Result {i+1}: {doc.page_content[:80]}...")

# 4.2 Metadata Filtering
filtered_results = vectorstore.similarity_search(
    "learning algorithms",
    k=3,
    filter={"topic": "machine_learning"}
)

print("\nFiltered Results (topic=machine_learning):")
for i, doc in enumerate(filtered_results):
    print(f"Filtered Result {i+1}: {doc.page_content[:80]}...")
    print(f"Metadata: {doc.metadata}")

In [None]:
# 4.3 Self-Query Retriever - LLM parses query for content and metadata
metadata_field_info = [
    AttributeInfo(
        name="topic",
        description="The topic of the document (machine_learning, deep_learning, nlp)",
        type="string"
    ),
    AttributeInfo(
        name="difficulty",
        description="The difficulty level (beginner, intermediate, advanced)",
        type="string"
    ),
    AttributeInfo(
        name="year",
        description="The year the content was created",
        type="integer"
    )
]

try:
    self_query_retriever = SelfQueryRetriever.from_llm(
        llm=llm,
        vectorstore=vectorstore,
        document_contents="Technical documents about AI and machine learning",
        metadata_field_info=metadata_field_info,
        verbose=True
    )
    
    # Test self-query with natural language
    self_query_results = self_query_retriever.invoke(
        "Find beginner-friendly content about machine learning from 2024"
    )
    
    print("Self-Query Results:")
    for i, doc in enumerate(self_query_results):
        print(f"Self-Query Result {i+1}: {doc.page_content}")
        print(f"Metadata: {doc.metadata}")
        print()
        
except Exception as e:
    print(f"Self-query retriever error: {e}")
    print("Falling back to regular similarity search")

In [None]:
# 4.4 Contextual Compression - Extract relevant portions
try:
    compressor = LLMChainExtractor.from_llm(llm)
    compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor,
        base_retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
    )
    
    compressed_results = compression_retriever.invoke(
        "What are the key concepts in machine learning?"
    )
    
    print("Compressed Retrieval Results:")
    for i, doc in enumerate(compressed_results):
        print(f"Compressed Result {i+1}: {doc.page_content}")
        print(f"Metadata: {doc.metadata}")
        print()
        
except Exception as e:
    print(f"Compression retriever error: {e}")
    print("This feature requires a compatible LLM")

In [None]:
# 4.5 Alternative Retrieval Methods
from langchain.retrievers import SVMRetriever, TFIDFRetriever

# Prepare text data for alternative retrievers
texts = [doc.page_content for doc in splits[:10]]  # Use first 10 splits

# SVM Retriever
svm_retriever = SVMRetriever.from_texts(texts, hf_embeddings)
svm_results = svm_retriever.get_relevant_documents("machine learning")

print("SVM Retriever Results:")
for i, doc in enumerate(svm_results[:2]):
    print(f"SVM Result {i+1}: {doc.page_content[:100]}...")

# TF-IDF Retriever
tfidf_retriever = TFIDFRetriever.from_texts(texts)
tfidf_results = tfidf_retriever.get_relevant_documents("machine learning")

print("\nTF-IDF Retriever Results:")
for i, doc in enumerate(tfidf_results[:2]):
    print(f"TF-IDF Result {i+1}: {doc.page_content[:100]}...")

## 5. Question Answering with RetrievalQA

### Multiple Chain Types and Custom Prompts

RetrievalQA combines document retrieval with LLM generation using different strategies:
- **Stuff**: Concatenate all documents (default)
- **Map-Reduce**: Process documents separately, then combine
- **Refine**: Iteratively refine answer with each document

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import RetrievalQA
from langchain import hub

# 5.1 Basic RAG Chain
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Custom prompt template
custom_prompt = ChatPromptTemplate.from_template(
    """You are a helpful AI assistant specializing in machine learning and AI.
    
    Use the following context to answer the question. If you don't know the answer based on the context, 
    say "I don't have enough information in the provided context to answer that question."
    
    Always cite which part of the context you used for your answer.
    
    Context: {context}
    
    Question: {question}
    
    Answer:"""
)

# Create RAG chain
rag_chain = (
    {"context": mmr_retriever | format_docs, "question": RunnablePassthrough()}
    | custom_prompt
    | llm
    | StrOutputParser()
)

# Test the chain
questions = [
    "What is machine learning?",
    "How does deep learning differ from traditional machine learning?",
    "What are the applications of natural language processing?"
]

print("RAG Chain Responses:")
for question in questions:
    try:
        response = rag_chain.invoke(question)
        print(f"\nQ: {question}")
        print(f"A: {response}")
        print("-" * 80)
    except Exception as e:
        print(f"Error processing question '{question}': {e}")

In [None]:
# 5.2 RetrievalQA with Different Chain Types
from langchain.prompts import PromptTemplate

# Custom prompt for RetrievalQA
qa_prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""Use the following pieces of context to answer the question at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer. 
    Use three sentences maximum. Keep the answer as concise as possible. 
    Always say "Thanks for asking!" at the end of the answer.
    
    {context}
    
    Question: {question}
    Helpful Answer:"""
)

# Test different chain types
chain_types = ["stuff", "map_reduce", "refine"]
test_question = "What are the main types of machine learning?"

for chain_type in chain_types:
    try:
        qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type=chain_type,
            retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
            return_source_documents=True,
            chain_type_kwargs={"prompt": qa_prompt} if chain_type == "stuff" else {}
        )
        
        result = qa_chain.invoke({"query": test_question})
        
        print(f"\n{chain_type.upper()} Chain Type:")
        print(f"Answer: {result['result']}")
        print(f"Source documents: {len(result['source_documents'])}")
        
    except Exception as e:
        print(f"Error with {chain_type} chain: {e}")

## 6. Conversational Chat with Memory

### Building a Stateful Chatbot

Conversational AI requires:
- **Memory**: Tracking conversation history
- **Context**: Understanding references to previous messages
- **Persistence**: Maintaining state across interactions

In [None]:
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

# 6.1 Simple Chat with Memory
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful AI assistant specializing in machine learning and artificial intelligence. "
               "Use the conversation history to provide context-aware responses."),
    MessagesPlaceholder(variable_name="messages"),
])

chat_chain = chat_prompt | llm

# Memory management
store = {}
def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chat_with_history = RunnableWithMessageHistory(
    chat_chain, 
    get_session_history,
    input_messages_key="messages",
    history_messages_key="messages"
)

# Test conversation
config = {"configurable": {"session_id": "user123"}}

conversation = [
    "Hi, I'm Alice! I'm new to machine learning.",
    "What's my name?",
    "Can you recommend some beginner-friendly ML topics?",
    "What did you just recommend?"
]

print("Conversational Chat Demo:")
for message in conversation:
    try:
        response = chat_with_history.invoke({"messages": message}, config)
        print(f"User: {message}")
        print(f"AI: {response.content}")
        print("-" * 50)
    except Exception as e:
        print(f"Error in conversation: {e}")
        break

In [None]:
# 6.2 Conversational RAG with Document Retrieval
# Set up memory for RAG
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key="answer"
)

# Create conversational retrieval chain
conversational_rag = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    memory=memory,
    return_source_documents=True,
    verbose=True
)

# Test conversational RAG
rag_conversation = [
    "What is machine learning?",
    "Can you tell me more about the types you mentioned?",
    "How does it relate to artificial intelligence?",
    "What are some real-world applications?"
]

print("\nConversational RAG Demo:")
for question in rag_conversation:
    try:
        result = conversational_rag.invoke({"question": question})
        
        print(f"User: {question}")
        print(f"AI: {result['answer']}")
        
        if result.get('source_documents'):
            print(f"Sources: {len(result['source_documents'])} documents")
        
        print("-" * 50)
        
    except Exception as e:
        print(f"Error in conversational RAG: {e}")
        break

## 7. Advanced Features and Best Practices

### Production-Ready Enhancements

In [None]:
# 7.1 Evaluation and Metrics
from datetime import datetime
import json

def evaluate_retrieval(query, ground_truth_docs, retrieved_docs, k=3):
    """Simple evaluation metrics for retrieval quality"""
    
    # Precision@K
    relevant_retrieved = len(set(ground_truth_docs) & set(retrieved_docs[:k]))
    precision_at_k = relevant_retrieved / min(k, len(retrieved_docs))
    
    # Recall@K
    recall_at_k = relevant_retrieved / len(ground_truth_docs) if ground_truth_docs else 0
    
    return {
        "precision_at_k": precision_at_k,
        "recall_at_k": recall_at_k,
        "retrieved_count": len(retrieved_docs),
        "relevant_count": relevant_retrieved
    }

# 7.2 Response Quality Assessment
def assess_response_quality(question, answer, context_docs):
    """Assess response quality based on various criteria"""
    
    metrics = {
        "timestamp": datetime.now().isoformat(),
        "question": question,
        "answer_length": len(answer),
        "context_docs_count": len(context_docs),
        "has_citations": "according to" in answer.lower() or "based on" in answer.lower(),
        "confidence_indicators": {
            "uncertain": any(phrase in answer.lower() for phrase in ["i don't know", "unclear", "uncertain"]),
            "confident": any(phrase in answer.lower() for phrase in ["definitely", "certainly", "clearly"])
        }
    }
    
    return metrics

# Example evaluation
test_query = "What is machine learning?"
test_results = vectorstore.similarity_search(test_query, k=5)
test_response = rag_chain.invoke(test_query)

quality_metrics = assess_response_quality(
    test_query, 
    test_response, 
    test_results
)

print("Response Quality Assessment:")
print(json.dumps(quality_metrics, indent=2))

In [None]:
# 7.3 Performance Optimization
import time

def benchmark_retrieval_methods():
    """Compare performance of different retrieval methods"""
    
    test_query = "machine learning algorithms"
    methods = {
        "Basic Similarity": lambda: vectorstore.similarity_search(test_query, k=3),
        "MMR": lambda: vectorstore.max_marginal_relevance_search(test_query, k=3),
        "Similarity with Score": lambda: vectorstore.similarity_search_with_score(test_query, k=3)
    }
    
    results = {}
    
    for method_name, method_func in methods.items():
        start_time = time.time()
        try:
            result = method_func()
            end_time = time.time()
            
            results[method_name] = {
                "time_ms": (end_time - start_time) * 1000,
                "result_count": len(result),
                "success": True
            }
        except Exception as e:
            results[method_name] = {
                "time_ms": None,
                "error": str(e),
                "success": False
            }
    
    return results

# Run benchmark
benchmark_results = benchmark_retrieval_methods()

print("Retrieval Method Benchmark:")
for method, metrics in benchmark_results.items():
    if metrics["success"]:
        print(f"{method}: {metrics['time_ms']:.2f}ms, {metrics['result_count']} results")
    else:
        print(f"{method}: Error - {metrics['error']}")

In [None]:
# 7.4 Robust Error Handling and Fallbacks
class RobustRAGChain:
    def __init__(self, primary_retriever, fallback_retriever, llm):
        self.primary_retriever = primary_retriever
        self.fallback_retriever = fallback_retriever
        self.llm = llm
        
    def retrieve_with_fallback(self, query, k=3):
        """Try primary retriever, fall back to secondary if needed"""
        try:
            docs = self.primary_retriever.invoke(query)
            if len(docs) >= k:
                return docs[:k], "primary"
        except Exception as e:
            print(f"Primary retriever failed: {e}")
        
        try:
            docs = self.fallback_retriever.get_relevant_documents(query)
            return docs[:k], "fallback"
        except Exception as e:
            print(f"Fallback retriever failed: {e}")
            return [], "failed"
    
    def generate_response(self, query):
        """Generate response with error handling"""
        docs, retriever_used = self.retrieve_with_fallback(query)
        
        if not docs:
            return "I apologize, but I couldn't find relevant information to answer your question."
        
        context = "\n\n".join([doc.page_content for doc in docs])
        
        prompt = f"""Based on the following context, answer the question. If the context doesn't contain enough information, say so.
        
        Context: {context}
        
        Question: {query}
        
        Answer:"""
        
        try:
            response = self.llm.invoke(prompt)
            return f"{response.content}\n\n[Retrieved using: {retriever_used}]"
        except Exception as e:
            return f"I encountered an error while generating the response: {e}"

# Create robust RAG chain
robust_rag = RobustRAGChain(
    primary_retriever=mmr_retriever,
    fallback_retriever=tfidf_retriever,
    llm=llm
)

# Test robust retrieval
test_queries = [
    "What is machine learning?",
    "How do neural networks work?",
    "What is the meaning of life?"  # Question likely not in our documents
]

print("Robust RAG Chain Results:")
for query in test_queries:
    response = robust_rag.generate_response(query)
    print(f"\nQ: {query}")
    print(f"A: {response}")
    print("-" * 50)

## 8. Summary and Best Practices

### Key Takeaways for Production RAG Systems

**Document Processing:**
- Use appropriate loaders for different formats
- Preserve metadata for filtering and traceability
- Choose chunk sizes based on your domain and use case

**Retrieval Strategy:**
- Start with basic similarity search, add MMR for diversity
- Use metadata filtering for domain-specific queries
- Consider compression for long documents
- Implement fallback retrievers for robustness

**Generation Quality:**
- Custom prompts improve response quality
- Test different chain types (stuff, map-reduce, refine)
- Add conversation memory for interactive applications
- Implement proper error handling

**Evaluation and Monitoring:**
- Track retrieval precision and recall
- Monitor response quality and user satisfaction
- Benchmark different approaches
- Log queries and responses for analysis

**Scalability Considerations:**
- Use persistent vector stores for large datasets
- Consider distributed retrieval for high throughput
- Cache frequent queries
- Monitor latency and costs

This notebook provides a comprehensive foundation for building production-ready RAG systems. Experiment with different combinations of techniques based on your specific use case and requirements.