# Document Chain Methods for RAG: A Comprehensive Tutorial

This notebook demonstrates different methods for combining retrieved documents to generate answers in RAG systems. Each method has its own strengths and use cases.

## Overview of Methods

1. **Stuff Documents Chain** - Simple concatenation of all documents  
2. **Refine Documents Chain** - Iterative refinement of answers  
3. **Map-Rerank Chain** - Score and rank individual document answers  
4. **Map-Reduce Chain** - Summarize then combine approach  

For detailed implementation code of document chain strategies, refer to:

**`retrieval_playground/src/post_retrieval/document_chain.py`**

## Setup and Imports


In [None]:
import json
import numpy as np
import warnings
from typing import Dict, List, Any
from IPython.display import Image, display
import gc

from langchain_core.prompts import ChatPromptTemplate
import pandas as pd
from retrieval_playground.utils import config
from retrieval_playground.src.baseline_rag import RAG
from retrieval_playground.src.evaluation import RAGEvaluator
from retrieval_playground.src.pre_retrieval.chunking_strategies import ChunkingStrategy
from retrieval_playground.src.post_retrieval import document_chain
import logging
logging.getLogger().setLevel(logging.WARNING)
warnings.filterwarnings("ignore")


## Load Test Data and Initialize Components


In [None]:
def load_test_queries() -> List[Dict[str, Any]]:
    """Load test queries from JSON file."""
    queries_path = config.TESTS_DIR / "test_queries.json"
    with open(queries_path, 'r') as f:
        return json.load(f)

# Initialize RAG system and evaluator
strategy = ChunkingStrategy.UNSTRUCTURED
rag = RAG(strategy=strategy)
evaluator = RAGEvaluator(metrics=['faithfulness', 'answer_relevancy'])

# Load test queries (using 1 for demo)
test_queries = load_test_queries()[1:2]
ground_truths = [q["reference"] for q in test_queries]

print(f"Sample query: {test_queries[0]['user_input']}")
print(f"Number of test queries: {len(test_queries)}")


## Evaluation Helper Function


In [None]:
def evaluate_method(method_name: str, rag_results: List[Dict], ground_truths: List[str]):
    """Helper function to evaluate and display results for any method."""
    print(f"\n=== {method_name} Results ===")
    
    # Evaluate with RAGAS metrics
    scores = evaluator.evaluate_rag_results(rag_results, ground_truths)
    
    # Display metrics
    print(f"Average Faithfulness: {np.round(np.nanmean(scores['faithfulness']), 2)}")
    print(f"Average Answer Relevancy: {np.round(np.nanmean(scores['answer_relevancy']), 2)}")
    
    return scores


## Method 1: Stuff Documents Chain

**What it is:** Simply concatenates all retrieved documents and passes them to the LLM in a single prompt.

**When to use:**
- Small number of documents that fit within context window
- When you want the LLM to consider all information simultaneously
- Fastest and simplest approach

**✅ Pros:** Simple, fast, considers all context at once <br>
**⚠️ Cons:** Limited by context window, may overwhelm LLM with too much information


![Refine Chain](../utils/images/stuff.png)

In [None]:
# Traditional LangChain approach
stuff_prompt = ChatPromptTemplate.from_template(
    """You are a helpful assistant that answers questions based on the provided context.
    Please provide a comprehensive answer based on the context below. 
    If the context doesn't contain enough information to answer the question, please say so.

    Question: {question} 

    Context:
    {context}

    Answer:"""
)

stuff_chain = document_chain.setup_stuff_chain()

# Test the method
stuff_results = []
for test_query in test_queries:
    query = test_query["user_input"]
    docs = [doc[0] for doc in rag.retrieve_context(query)]
    result = stuff_chain.invoke({"context": docs, "question": query})
    stuff_results.append({
        "question": query, 
        "answer": result, 
        "context": [{"content": doc.page_content} for doc in docs]
    })

# Evaluate
stuff_scores = evaluate_method("Stuff Documents Chain", stuff_results, ground_truths)

In [None]:
del stuff_prompt, stuff_results, stuff_chain
gc.collect()

## Method 2: Refine Documents Chain

**What it is:** Processes documents sequentially, starting with an initial answer from the first document, then refining it with each subsequent document.

**When to use:**
- When you have many documents that don't fit in context window
- When you want iterative improvement of answers
- When document order matters

**✅ Pros:** Handles large document sets, iterative refinement, maintains context <br>
**⚠️ Cons:** Slower (multiple LLM calls), order-dependent, potential information loss


![Refine Chain](../utils/images/refine.png)

In [None]:
# Summarization prompt
summarize_prompt = ChatPromptTemplate.from_messages([
    ("human", 
    """You are a helpful assistant that answers questions based on the given context.

    Question: {question}

    Context:
    {context}

    Provide the best possible answer based on this context:""")
])


# Refinement prompt
refine_template = """
We have an existing answer so far:
{existing_answer}

Here is some new context:
------------
{context}
------------

Refine the existing answer where needed, keeping it accurate and comprehensive.
If the new context is not useful, keep the answer unchanged.
"""

refine_chain = document_chain.setup_refine_chain()

# Test the method
refine_results = []
for test_query in test_queries:
    query = test_query["user_input"]
    docs = [doc[0] for doc in rag.retrieve_context(query)]
    result = refine_chain.invoke({"input_documents": docs, "question": query})["output_text"]
    refine_results.append({
        "question": query, 
        "answer": result, 
        "context": [{"content": doc.page_content} for doc in docs]
    })
# Evaluate
refine_scores = evaluate_method("Refine Documents Chain", refine_results, ground_truths)

In [None]:
del summarize_prompt, refine_template, refine_results, refine_chain
gc.collect()

## Method 3: Map-Rerank Chain

**What it is:** Processes each document independently to generate answers with confidence scores, then selects the highest-scoring answer.

**When to use:**
- When you want to identify the most relevant document
- When documents might contain conflicting information
- When you need confidence scores for answers

**✅ Pros:** Parallel processing, confidence scoring, handles conflicting information <br>
**⚠️ Cons:** Only uses one document's information, may miss connections between documents


![Map-Rerank Chain](../utils/images/map_rerank.png)

In [None]:
prompt_template = """
    You are a helpful assistant. 
    Answer the following question using ONLY the given context. 
    If the context does not contain the answer, say "Not enough information."

    Question: {question}

    Context:
    {context}

    Provide your answer and a confidence score (1-10) in this format:
    <Answer>
    Score: <Score>
    """

rerank_chain = document_chain.setup_map_rerank_chain()

# Test the method
rerank_results = []
for test_query in test_queries:
    query = test_query["user_input"]
    docs = [doc[0] for doc in rag.retrieve_context(query)]
    result = rerank_chain.invoke({"input_documents": docs, "question": query})["output_text"]
    rerank_results.append({
        "question": query, 
        "answer": result, 
        "context": [{"content": doc.page_content} for doc in docs]
    })

# Evaluate
rerank_scores = evaluate_method("Map-Rerank Chain", rerank_results, ground_truths)

In [None]:
del prompt_template, rerank_results, rerank_chain

## Method 4: Map-Reduce Chain

**What it is:** First maps (summarizes) each document independently, then reduces (combines) all summaries to generate a final answer.

**When to use:**
- Large number of documents that exceed context window
- When you want to consider information from all documents
- When documents contain complementary information

**✅ Pros:** Handles large document sets, parallel processing, considers all information <br>
**⚠️ Cons:** Two-step process (slower), potential information loss in summarization


![Map-Reduce Chain](../utils/images/map_reduce.png)

In [None]:
# Map prompt - summarizes each document
map_template = "Write a concise summary of the following: {docs}"

reduce_template = """The following is a set of summaries:
{docs}

Based on these summaries, answer the question: {question}"""

map_reduce_chain = document_chain.setup_map_reduce_chain()

# Test the method
map_reduce_results = []
for test_query in test_queries:
    query = test_query["user_input"]
    docs = [doc[0] for doc in rag.retrieve_context(query)]
    result = map_reduce_chain.invoke({"input_documents": docs, "question": query})["output_text"]
    map_reduce_results.append({
        "question": query, 
        "answer": result, 
        "context": [{"content": doc.page_content} for doc in docs]
    })

# Evaluate
map_reduce_scores = evaluate_method("Map-Reduce Chain", map_reduce_results, ground_truths)

In [None]:
del map_template, reduce_template, docs, result, map_reduce_results, map_reduce_chain
gc.collect()

## Method Comparison and Summary

Let's compare all methods side by side:


In [None]:
# Create comparison table
comparison_data = {
    'Method': ['Stuff Chain', 'Refine Chain', 'Map-Rerank Chain', 'Map-Reduce Chain'],
    'Faithfulness': [
        np.round(np.nanmean(stuff_scores['faithfulness']), 2),
        np.round(np.nanmean(refine_scores['faithfulness']), 2),
        np.round(np.nanmean(rerank_scores['faithfulness']), 2),
        np.round(np.nanmean(map_reduce_scores['faithfulness']), 2)
    ],
    'Answer Relevancy': [
        np.round(np.nanmean(stuff_scores['answer_relevancy']), 2),
        np.round(np.nanmean(refine_scores['answer_relevancy']), 2),
        np.round(np.nanmean(rerank_scores['answer_relevancy']), 2),
        np.round(np.nanmean(map_reduce_scores['answer_relevancy']), 2)
    ],
    'Best For': [
        'Small doc sets, simple queries',
        'Large doc sets, iterative refinement',
        'Confidence scoring, conflicting info',
        'Large doc sets, comprehensive answers'
    ],
    'Speed': ['Fastest', 'Slow', 'Medium', 'Slow'],
    'Context Window': ['Limited', 'Unlimited', 'Limited per doc', 'Unlimited']
}

comparison_df = pd.DataFrame(comparison_data)
print("\n=== Method Comparison ===")
print(comparison_df.to_string(index=False))

## LangGraph Implementation

```python 
app = document_chain.setup_refine_chain_langgraph()
# app = document_chain.setup_map_rerank_chain_langgraph()
# app = document_chain.setup_map_reduce_chain_langgraph

# --- Option 1: Render graph directly in notebook ---
try:
    display(
        Image(app.get_graph().draw_mermaid_png())
    )
except Exception as e:
    print("Graph rendering failed. Try Option 2 below.")
    print("Error:", e)

# --- Option 2: Get Mermaid code for manual rendering ---
# If Option 1 fails, uncomment the lines below.
# Copy the printed code into https://mermaid.live/ to view your graph.

# mermaid_code = app.get_graph().draw_mermaid()
# print(mermaid_code)
```

<!-- ![Graph](../utils/images/graph_refine.png) -->

```python
============================================================
Test LangGraph implementation
============================================================

# Print header for clarity
print("\n=== Testing LangGraph Implementation ===")

# Take a sample query (first test case input)
query = test_queries[0]["user_input"]

# Retrieve relevant documents for the given query
docs = [doc[0].page_content for doc in rag.retrieve_context(query)]

# Run LangGraph refine app with query and retrieved docs
final_state = app.invoke({"question": query, "docs": docs, "index": 0})

# Display only the first 200 characters of the answer for readability
print(f"LangGraph Result: {final_state['answer'][:200]}...") 
```

## 📌 Recommendations  

### Choose Your Method Based On:  

1. **Document Count & Size**  
   - Few small documents → **Stuff Chain**  
   - Many documents → **Refine Chain** or **Map-Reduce Chain**  

2. **Information Quality**  
   - Conflicting information → **Map-Rerank Chain**  
   - Complementary information → **Map-Reduce Chain**  

3. **Performance Requirements**  
   - Speed critical → **Stuff Chain**  
   - Quality critical → **Refine Chain** or **Map-Reduce Chain**  

4. **Special Needs**  
   - Need confidence scores → **Map-Rerank Chain**  
   - Need iterative improvement → **Refine Chain**  
   - Need comprehensive coverage → **Map-Reduce Chain**  

### Modern vs Traditional Approaches  

- **LangGraph**: Best for complex workflows, debugging, and fine-grained control  
- **Traditional LangChain**: Easier for standard use cases, minimal setup required  


## 🧪 Experiment with Your Own Data

Use the code below to test different methods with your own queries:

```python
def test_all_methods(query: str, show_context: bool = False):
    """Test all methods with a custom query."""
    print(f"\n=== Testing Query: {query} ===")
    
    # Get documents
    docs = [doc[0] for doc in rag.retrieve_context(query)]
    
    if show_context:
        print(f"\nRetrieved {len(docs)} documents")
        for i, doc in enumerate(docs[:2]):  # Show first 2
            print(f"Doc {i+1}: {doc.page_content[:100]}...")
    
    # Test each method
    methods = {
        "Stuff Chain": lambda: stuff_chain.invoke({"context": docs, "question": query}),
        "Refine Chain": lambda: refine_chain.invoke({"input_documents": docs, "question": query})["output_text"],
        "Map-Rerank Chain": lambda: rerank_chain.invoke({"input_documents": docs, "question": query})["output_text"],
        "Map-Reduce Chain": lambda: map_reduce_chain.invoke({"input_documents": docs, "question": query})["output_text"]
    }
    
    for method_name, method_func in methods.items():
        try:
            result = method_func()
            print(f"\n{method_name}: {result[:200]}...")
        except Exception as e:
            print(f"\n{method_name}: Error - {str(e)}")

# Example usage - uncomment to test
test_all_methods("What is the key innovation of the proposed Riemannian change point detection method?", show_context=True)
```

## Conclusion

This tutorial demonstrated four different approaches to combining documents in RAG systems:

1. **Stuff Documents Chain** - Simple and fast for small document sets  
2. **Refine Documents Chain** - Iterative improvement for large document sets  
3. **Map-Rerank Chain** - Confidence-based selection from individual documents  
4. **Map-Reduce Chain** - Comprehensive approach for large, complementary document sets  

Each method has its strengths and is suitable for different scenarios. The choice depends on your specific requirements for speed, accuracy, document size, and information quality.

**Key Takeaways:**  
- **Stuff Chain**: Best for simple cases with few documents  
- **Refine Chain**: Best for iterative improvement with many documents  
- **Map-Rerank**: Best when you need confidence scores and document ranking  
- **Map-Reduce**: Best for comprehensive analysis of large document sets  
- **LangGraph**: Provides better control and debugging capabilities  
- **Traditional LangChain**: Simpler setup for standard use cases  


Images Reference: https://mlpills.substack.com/p/issue-71-langchains-text-processing