# üöÄ Notebook 07: Complete RAG Pipeline

**LangChain 1.0.5+ | Mixed Level Class**

## üéØ Objectives
1. Build a complete RAG application
2. Use LCEL to chain components
3. Create production-ready code
4. Handle errors properly
5. Implement best practices

In [1]:
from dotenv import load_dotenv
from pathlib import Path
load_dotenv()
print("‚úÖ Setup complete")

‚úÖ Setup complete


## 1. Complete RAG Architecture

### üî∞ BEGINNER

```
User Query
    ‚Üì
Retriever (finds relevant docs)
    ‚Üì
Format Context
    ‚Üì
Prompt Template
    ‚Üì
LLM
    ‚Üì
Answer
```

## 2. Step-by-Step RAG Build

In [2]:
# Step 1: Load Documents
from langchain_community.document_loaders import PyPDFLoader, TextLoader, CSVLoader

def load_all_documents(data_dir="sample_data"):
    """Load documents from multiple sources"""
    all_docs = []
    
    # Load text files
    for txt_file in Path(data_dir).glob("*.txt"):
        loader = TextLoader(str(txt_file))
        all_docs.extend(loader.load())
        print(f"  ‚úÖ Loaded {txt_file.name}")
    
    # Load CSVs
    for csv_file in Path(data_dir).glob("*.csv"):
        loader = CSVLoader(str(csv_file))
        all_docs.extend(loader.load())
        print(f"  ‚úÖ Loaded {csv_file.name}")
    
    return all_docs

print("Loading documents...")
documents = load_all_documents()
print(f"\nüìÑ Total documents: {len(documents)}")

Loading documents...
  ‚úÖ Loaded notes.txt
  ‚úÖ Loaded products.csv

üìÑ Total documents: 16


In [3]:
# Step 2: Split Documents
from langchain_text_splitters import RecursiveCharacterTextSplitter

print("Splitting documents...")
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

chunks = text_splitter.split_documents(documents)
print(f"‚úÇÔ∏è Created {len(chunks)} chunks")

Splitting documents...
‚úÇÔ∏è Created 27 chunks


In [4]:
# Step 3: Create Vector Store
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

print("Creating embeddings...")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Check if index exists
index_path = "./rag_vectorstore"
if Path(index_path).exists():
    print("Loading existing vector store...")
    vectorstore = FAISS.load_local(
        index_path,
        embeddings,
        allow_dangerous_deserialization=True
    )
else:
    print("Creating new vector store (this may take a minute)...")
    vectorstore = FAISS.from_documents(chunks, embeddings)
    vectorstore.save_local(index_path)

print("‚úÖ Vector store ready")

Creating embeddings...
Creating new vector store (this may take a minute)...
‚úÖ Vector store ready


In [5]:
# Step 4: Create Retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}
)

print("‚úÖ Retriever created")

‚úÖ Retriever created


In [6]:
# Step 5: Create Prompt Template
from langchain_core.prompts import ChatPromptTemplate

template = """You are a helpful assistant. Answer the question based on the context below.
If you cannot answer based on the context, say "I don't have enough information to answer that."

Context:
{context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)
print("‚úÖ Prompt template created")

‚úÖ Prompt template created


In [7]:
# Step 6: Create LLM
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0  # Deterministic for factual answers
)

print("‚úÖ LLM initialized")

‚úÖ LLM initialized


## 3. Building the RAG Chain with LCEL

### üéì INTERMEDIATE: LCEL Chain

In [8]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Helper function to format documents
def format_docs(docs):
    """Format retrieved documents into a single string"""
    return "\n\n".join(doc.page_content for doc in docs)

# Build the RAG chain
rag_chain = (
    {
        "context": retriever | format_docs,  # Retrieve and format docs
        "question": RunnablePassthrough()    # Pass through the question
    }
    | prompt          # Format the prompt
    | llm             # Generate answer
    | StrOutputParser()  # Extract text from response
)

print("‚úÖ RAG chain created!")

‚úÖ RAG chain created!


## 4. Using the RAG System

In [9]:
# Ask questions!
questions = [
    "What is RAG?",
    "What are the recommended chunk sizes?",
    "What embedding models are available?"
]

for question in questions:
    print("\n" + "="*70)
    print(f"Question: {question}")
    print("="*70)
    
    # Get answer
    answer = rag_chain.invoke(question)
    print(f"\nAnswer:\n{answer}")
    
    # Show source documents
    print("\nSource Documents:")
    docs = retriever.invoke(question)
    for i, doc in enumerate(docs, 1):
        source = doc.metadata.get('source', 'Unknown')
        print(f"  {i}. {Path(source).name}")


Question: What is RAG?

Answer:
RAG stands for Retrieval-Augmented Generation.

Source Documents:
  1. notes.txt
  2. notes.txt
  3. notes.txt
  4. notes.txt

Question: What are the recommended chunk sizes?

Answer:
For general text, the recommended chunk size is 1000 characters with an overlap of 200.

Source Documents:
  1. notes.txt
  2. notes.txt
  3. notes.txt
  4. notes.txt

Question: What embedding models are available?

Answer:
The available embedding models are:
1. OpenAI text-embedding-3-small
2. OpenAI text-embedding-3-large
3. HuggingFace all-MiniLM-L6-v2
4. HuggingFace all-mpnet-base-v2
5. Google Gemini embedding-001

Source Documents:
  1. notes.txt
  2. notes.txt
  3. notes.txt
  4. notes.txt


## 5. Production-Ready Version

### üéì ADVANCED: With Error Handling

In [10]:
def rag_query(question: str, verbose: bool = True):
    """
    Production-ready RAG query function
    
    Args:
        question: User question
        verbose: Print source documents
    
    Returns:
        Answer string
    """
    try:
        # Get answer
        answer = rag_chain.invoke(question)
        
        if verbose:
            # Retrieve source docs
            docs = retriever.invoke(question)
            
            print(f"\nQuestion: {question}")
            print(f"\nAnswer: {answer}")
            print(f"\nSources ({len(docs)} documents):")
            for doc in docs:
                print(f"  - {Path(doc.metadata['source']).name}")
        
        return answer
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        return None

# Test it
rag_query("What is the transformer architecture?")


Question: What is the transformer architecture?

Answer: I don't have enough information to answer that.

Sources (4 documents):
  - notes.txt
  - notes.txt
  - notes.txt
  - notes.txt


"I don't have enough information to answer that."

## 6. Streaming Responses

### üéì ADVANCED: Stream tokens as they're generated

In [11]:
# Stream response
question = "Explain machine learning in simple terms"

print(f"Question: {question}\n")
print("Answer (streaming):")

for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)

print("\n")

Question: Explain machine learning in simple terms

Answer (streaming):
I don't have enough information to answer that.



## 7. Best Practices Summary

### ‚úÖ Production Checklist

- ‚úÖ **Persist vector stores** (save_local)
- ‚úÖ **Error handling** (try/except)
- ‚úÖ **Logging** (track queries and performance)
- ‚úÖ **Prompt engineering** (clear instructions)
- ‚úÖ **Source attribution** (show where answers come from)
- ‚úÖ **Testing** (evaluate with test questions)
- ‚úÖ **Monitoring costs** (track API usage)
- ‚úÖ **Rate limiting** (prevent abuse)
- ‚úÖ **Caching** (cache common queries)
- ‚úÖ **Metadata filtering** (improve precision)

### üìä Optimization Tips

1. **Chunk size:** Test 500, 1000, 1500 with your data
2. **Retrieval k:** Start with 4, adjust based on quality
3. **Embeddings:** text-embedding-3-small for cost/performance
4. **LLM:** GPT-3.5-Turbo for speed, GPT-4 for quality
5. **Temperature:** 0 for factual, 0.7 for creative

## Summary

üéâ **Congratulations!** You've built a complete RAG system!

You now know:
- ‚úÖ How to load and process documents
- ‚úÖ Text splitting strategies
- ‚úÖ Creating embeddings
- ‚úÖ Using vector stores
- ‚úÖ Retrieval strategies
- ‚úÖ Building LCEL chains
- ‚úÖ Production best practices

### üöÄ Next Steps

- Build a RAG app for your own documents
- Experiment with different embedding models
- Try advanced retrieval (hybrid search, re-ranking)
- Add conversation memory
- Deploy to production

**Happy building! üéâ**