# 13. Fusion RAG - Reciprocal Rank Fusion

**Complexity:** ⭐⭐⭐

## Overview

**RAG-Fusion** combines multiple query perspectives using Reciprocal Rank Fusion (RRF) to improve retrieval quality. It's an enhancement over Multi-Query RAG (notebook 05) that uses a more sophisticated fusion algorithm.

### The Problem

Single queries have limitations:
- May miss relevant documents due to wording
- Can't capture multiple aspects of complex questions
- Vector similarity is sensitive to phrasing

Multi-Query RAG helps, but simple deduplication loses ranking information.

### The Solution

RAG-Fusion improves on Multi-Query RAG by:
1. Generating multiple query variations (like Multi-Query)
2. Retrieving documents for each query
3. **Using Reciprocal Rank Fusion** to combine results intelligently
4. Re-ranking documents based on combined scores

### Reciprocal Rank Fusion (RRF)

RRF is a powerful rank aggregation method:

```
RRF_score(doc) = Σ (1 / (k + rank_i))
```

Where:
- `k` = constant (typically 60)
- `rank_i` = rank of document in i-th query results
- Sum across all queries where document appears

**Example:**
```
Query 1 ranks: [A, B, C, D]
Query 2 ranks: [C, A, E, F]
Query 3 ranks: [B, C, A, G]

Document A:
  - Appears at rank 1 in Q1: 1/(60+1) = 0.0164
  - Appears at rank 2 in Q2: 1/(60+2) = 0.0161
  - Appears at rank 3 in Q3: 1/(60+3) = 0.0159
  - Total RRF score: 0.0484

Document C:
  - Appears at rank 3 in Q1: 1/(60+3) = 0.0159
  - Appears at rank 1 in Q2: 1/(60+1) = 0.0164
  - Appears at rank 2 in Q3: 1/(60+2) = 0.0161
  - Total RRF score: 0.0484
```

**Why RRF works:**
- ✅ Favors documents that appear in multiple query results
- ✅ Considers both frequency and rank position
- ✅ No need for score normalization
- ✅ Robust to outliers

### Pipeline

```
Query → Generate N alternative queries → Retrieve for each query
    → Apply RRF algorithm → Re-rank documents → Generate answer
```

### Fusion RAG vs Multi-Query RAG

| Aspect | Multi-Query RAG | Fusion RAG |
|--------|-----------------|------------|
| Query generation | ✅ Multiple queries | ✅ Multiple queries |
| Retrieval | ✅ Parallel | ✅ Parallel |
| Fusion method | Simple deduplication | **RRF algorithm** |
| Ranking | Lost | **Preserved and combined** |
| Quality | Good | **Better** |
| Complexity | Low | Medium |

### When to Use

✅ **Good for:**
- Complex, multi-faceted questions
- Queries with multiple valid interpretations
- When ranking quality is important
- When you want robust retrieval

❌ **Not ideal for:**
- Simple factual lookups
- Latency-critical applications
- Limited API budgets

### Trade-offs

**Pros:**
- ✅ Better retrieval than single-query RAG
- ✅ More sophisticated than Multi-Query RAG
- ✅ Robust ranking algorithm
- ✅ Captures diverse perspectives

**Cons:**
- ❌ Higher latency (multiple retrievals)
- ❌ More API calls (query generation)
- ❌ More complex implementation

---

## Implementation

Let's build RAG-Fusion step by step.

## 1. Setup and Imports

In [1]:
import sys
import time
from pathlib import Path
from collections import defaultdict
from typing import List

# Add parent directory to path for imports
sys.path.append(str(Path("../..").resolve()))

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

from shared.config import (
    verify_api_key,
    DEFAULT_MODEL,
    DEFAULT_TEMPERATURE,
    OPENAI_EMBEDDING_MODEL,
    VECTOR_STORE_DIR,
)
from shared.loaders import load_and_split
from shared.prompts import (
    FUSION_QUERY_GENERATION_PROMPT,
    FUSION_RAG_ANSWER_PROMPT,
)
from shared.utils import (
    format_docs,
    print_section_header,
    load_vector_store,
    save_vector_store,
)

# Verify API key
verify_api_key()

print("✓ All imports successful")
print(f"✓ Using model: {DEFAULT_MODEL}")
print(f"✓ Using embeddings: {OPENAI_EMBEDDING_MODEL}")

✓ OpenAI API Key: LOADED
  Preview: sk-proj...vIQA
✓ All imports successful
✓ Using model: gpt-4o-mini
✓ Using embeddings: text-embedding-3-small


## 2. Load Documents and Create Vector Store

In [2]:
from langchain_community.vectorstores import FAISS

print_section_header("Loading Documents and Vector Store")

# Load and split documents (returns tuple: original_docs, chunks)
_, docs = load_and_split(
    chunk_size=1000,
    chunk_overlap=200,
)

print(f"\n✓ Loaded {len(docs)} chunks")

# Initialize embeddings
embeddings = OpenAIEmbeddings(model=OPENAI_EMBEDDING_MODEL)

# Load or create vector store
store_path = VECTOR_STORE_DIR / "fusion_rag"
vectorstore = load_vector_store(store_path, embeddings)

if vectorstore is None:
    print("\nCreating vector store...")
    vectorstore = FAISS.from_documents(docs, embeddings)
    save_vector_store(vectorstore, store_path)
    print("✓ Vector store created and saved")
else:
    print("✓ Loaded existing vector store")

# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
print("✓ Retriever ready")


LOADING DOCUMENTS AND VECTOR STORE

Loading 4 documents from web...
  - https://python.langchain.com/docs/use_cases/question_answering/
  - https://python.langchain.com/docs/modules/data_connection/retrievers/
  - https://python.langchain.com/docs/modules/model_io/llms/
  - https://python.langchain.com/docs/use_cases/chatbots/
✓ Loaded 4 documents
✓ Added custom metadata to all documents
Splitting documents...
  - Chunk size: 1000
  - Chunk overlap: 200
✓ Created 122 chunks

  Sample chunk:
    - Length: 991 chars
    - Source: https://python.langchain.com/docs/use_cases/question_answering/
    - Preview: Build a RAG agent with LangChain - Docs by LangChainSkip to main contentWe've raised a $125M Series B to build the platform for agent engineering. Rea...

✓ Loaded 122 chunks
✗ Error loading vector store from /Users/gianlucamazza/Workspace/notebooks/llm_rag/data/vector_stores/fusion_rag: Error in faiss::FileIOReader::FileIOReader(const char *) at /Users/runner/work/faiss-wheels/faiss

## 3. Implement Reciprocal Rank Fusion

The core algorithm of RAG-Fusion.

In [3]:
def reciprocal_rank_fusion(
    results_list: List[List[Document]],
    k: int = 60,
) -> List[tuple[Document, float]]:
    """
    Apply Reciprocal Rank Fusion to combine multiple ranked lists.
    
    Args:
        results_list: List of ranked document lists (one per query)
        k: RRF constant (default 60, as per literature)
    
    Returns:
        List of (document, score) tuples, sorted by score descending
    """
    # Track scores for each unique document
    doc_scores = defaultdict(float)
    doc_objects = {}  # Map content to document object
    
    # Process each query's results
    for results in results_list:
        for rank, doc in enumerate(results, start=1):
            # Use page_content as unique identifier
            doc_key = doc.page_content
            
            # Calculate RRF score contribution
            score = 1.0 / (k + rank)
            doc_scores[doc_key] += score
            
            # Store document object (in case of duplicates, keep first)
            if doc_key not in doc_objects:
                doc_objects[doc_key] = doc
    
    # Sort by score descending
    ranked_docs = [
        (doc_objects[doc_key], score)
        for doc_key, score in sorted(
            doc_scores.items(),
            key=lambda x: x[1],
            reverse=True,
        )
    ]
    
    return ranked_docs


print("✓ Reciprocal Rank Fusion function defined")

# Test the function
print("\nExample RRF calculation:")
print("-" * 80)
print("Query 1 ranks: [A(rank=1), B(rank=2), C(rank=3)]")
print("Query 2 ranks: [C(rank=1), A(rank=2), D(rank=3)]")
print("Query 3 ranks: [B(rank=1), C(rank=2), A(rank=3)]")
print("\nRRF scores (k=60):")
print("  Doc A: 1/61 + 1/62 + 1/63 = 0.0164 + 0.0161 + 0.0159 = 0.0484")
print("  Doc B: 1/62 + 1/61 = 0.0161 + 0.0164 = 0.0325")
print("  Doc C: 1/63 + 1/61 + 1/62 = 0.0159 + 0.0164 + 0.0161 = 0.0484")
print("  Doc D: 1/63 = 0.0159")
print("\nFinal ranking: A=C (tie) > B > D")

✓ Reciprocal Rank Fusion function defined

Example RRF calculation:
--------------------------------------------------------------------------------
Query 1 ranks: [A(rank=1), B(rank=2), C(rank=3)]
Query 2 ranks: [C(rank=1), A(rank=2), D(rank=3)]
Query 3 ranks: [B(rank=1), C(rank=2), A(rank=3)]

RRF scores (k=60):
  Doc A: 1/61 + 1/62 + 1/63 = 0.0164 + 0.0161 + 0.0159 = 0.0484
  Doc B: 1/62 + 1/61 = 0.0161 + 0.0164 = 0.0325
  Doc C: 1/63 + 1/61 + 1/62 = 0.0159 + 0.0164 + 0.0161 = 0.0484
  Doc D: 1/63 = 0.0159

Final ranking: A=C (tie) > B > D


## 4. Build Query Generation Chain

In [4]:
print_section_header("Building Query Generation Chain")

# Initialize LLM
llm = ChatOpenAI(
    model=DEFAULT_MODEL,
    temperature=DEFAULT_TEMPERATURE,
)

# Create query generation chain
query_gen_chain = FUSION_QUERY_GENERATION_PROMPT | llm | StrOutputParser()

print("✓ Query generation chain created")

# Test query generation
test_query = "What is LCEL in LangChain?"
print(f"\nTest query: {test_query}")
print("\nGenerated alternatives:")
print("-" * 80)

alternative_queries_text = query_gen_chain.invoke({
    "question": test_query,
    "num_queries": 3,
})
print(alternative_queries_text)


BUILDING QUERY GENERATION CHAIN

✓ Query generation chain created

Test query: What is LCEL in LangChain?

Generated alternatives:
--------------------------------------------------------------------------------
1. Can you explain what LCEL stands for in the context of LangChain?
2. What does LCEL mean, and how is it used within LangChain? Additionally, what are its key features?
3. In LangChain, what is the significance of LCEL, and how does it relate to other components of the framework?


## 5. Implement Fusion RAG Retriever

In [None]:
def fusion_retriever(
    query: str,
    retriever,
    llm,
    num_queries: int = 4,
    k_final: int = 6,
    rrf_k: int = 60,
    verbose: bool = False,
) -> List[Document]:
    """
    Fusion RAG retriever with RRF.
    
    Args:
        query: Original user query
        retriever: LangChain retriever
        llm: LLM for query generation
        num_queries: Number of alternative queries to generate
        k_final: Number of final documents to return
        rrf_k: RRF constant
        verbose: Print debug information
    
    Returns:
        List of re-ranked documents
    """
    if verbose:
        print(f"\n[Fusion Retriever] Original query: {query}")
    
    # 1. Generate alternative queries
    query_gen_chain = FUSION_QUERY_GENERATION_PROMPT | llm | StrOutputParser()
    alternatives_text = query_gen_chain.invoke({
        "question": query,
        "num_queries": num_queries - 1,  # -1 because we include original
    })
    
    # Parse alternative queries
    alternative_queries = [
        line.strip()
        for line in alternatives_text.split("\n")
        if line.strip() and any(char.isalpha() for char in line)
    ]
    
    # Remove numbering if present (e.g., "1. Query" -> "Query")
    alternative_queries = [
        q.split(".", 1)[1].strip() if q[0].isdigit() and "." in q else q
        for q in alternative_queries
    ]
    
    # Include original query
    all_queries = [query] + alternative_queries[:num_queries-1]
    
    if verbose:
        print(f"\n[Fusion Retriever] Generated {len(all_queries)} queries:")
        for i, q in enumerate(all_queries, 1):
            print(f"  {i}. {q}")
    
    # 2. Retrieve documents for each query
    all_results = []
    for q in all_queries:
        results = retriever.invoke(q)
        all_results.append(results)
        if verbose:
            print(f"\n[Fusion Retriever] Query '{q[:50]}...' retrieved {len(results)} docs")
    
    # 3. Apply Reciprocal Rank Fusion
    fused_results = reciprocal_rank_fusion(all_results, k=rrf_k)
    
    if verbose:
        print("\n[Fusion Retriever] After RRF, top 3 scores:")
        for i, (doc, score) in enumerate(fused_results[:3], 1):
            print(f"  {i}. Score: {score:.4f} | Preview: {doc.page_content[:60]}...")
    
    # 4. Return top-k documents
    final_docs = [doc for doc, score in fused_results[:k_final]]
    
    if verbose:
        print(f"\n[Fusion Retriever] Returning top {len(final_docs)} documents")
    
    return final_docs


print("✓ Fusion retriever function defined")

✓ Fusion retriever function defined


## 6. Build Fusion RAG Chain

In [12]:
# Create a custom runnable for fusion retrieval
from langchain_core.runnables import RunnableLambda

print_section_header("Building Fusion RAG Chain")

fusion_retriever_runnable = RunnableLambda(
    lambda x: fusion_retriever(
        query=x if isinstance(x, str) else str(x),
        retriever=retriever,
        llm=llm,
        num_queries=4,
        k_final=6,
        verbose=False,
    )
)

# Build the RAG chain
fusion_rag_chain = (
    {
        "context": fusion_retriever_runnable | format_docs,
        "input": RunnablePassthrough(),
        "original_query": RunnablePassthrough(),
        "num_queries": lambda _: 4,
        "num_docs": lambda _: 6,
    }
    | FUSION_RAG_ANSWER_PROMPT
    | llm
    | StrOutputParser()
)

print("✓ Fusion RAG chain created")


BUILDING FUSION RAG CHAIN

✓ Fusion RAG chain created


## 7. Test Fusion RAG

In [7]:
print_section_header("Testing Fusion RAG")

test_queries = [
    "What is LCEL and how do I use it?",
    "Explain different types of memory in conversational AI",
    "How do retrievers work in RAG applications?",
]

for i, query in enumerate(test_queries, 1):
    print("\n" + "=" * 80)
    print(f"Query {i}: {query}")
    print("=" * 80)
    
    start_time = time.time()
    response = fusion_rag_chain.invoke(query)
    elapsed = time.time() - start_time
    
    print("\n" + response)
    print(f"\n⏱️  Time: {elapsed:.2f}s")


TESTING FUSION RAG


Query 1: What is LCEL and how do I use it?

LCEL stands for LangChain's Enhanced Language model, which is a framework designed to facilitate the development of applications powered by large language models (LLMs). It provides a standardized interface for interacting with different model providers, making it easier to build and deploy applications without being locked into a specific provider.

To use LCEL, you can follow these steps:

1. **Installation**: First, you need to install LangChain using pip. You can do this by running the following command in your terminal:
   ```
   pip install -U langchain
   ```
   Make sure you have Python 3.10 or higher installed.

2. **Building an Agent**: LangChain allows you to create agents with minimal code. You can build a simple agent in under 10 lines of code. This agent can handle various tasks, including answering questions, performing searches, and more.

3. **Using LangGraph**: For more advanced needs, you can utilize L

## 8. Detailed Example: Inspect the Fusion Process

In [None]:
print_section_header("Detailed Fusion Process Analysis")

query = "What are the main components of LangChain?"
print(f"Query: {query}\n")

# Run with verbose mode
print("=" * 80)
print("FUSION RETRIEVAL PROCESS:")
print("=" * 80)

fused_docs = fusion_retriever(
    query=query,
    retriever=retriever,
    llm=llm,
    num_queries=4,
    k_final=6,
    verbose=True,
)

print("\n" + "=" * 80)
print("FINAL RETRIEVED DOCUMENTS:")
print("=" * 80)
for i, doc in enumerate(fused_docs, 1):
    print(f"\nDocument {i}:")
    print(f"Source: {doc.metadata.get('source', 'unknown')[:60]}")
    print(f"Content: {doc.page_content[:200]}...")


DETAILED FUSION PROCESS ANALYSIS

Query: What are the main components of LangChain?

FUSION RETRIEVAL PROCESS:

[Fusion Retriever] Original query: What are the main components of LangChain?

[Fusion Retriever] Generated 4 queries:
  1. What are the main components of LangChain?
  2. What are the key elements that make up LangChain?
  3. Can you list the primary components of LangChain? Additionally, how do these components interact with each other?
  4. What are the essential parts of the LangChain framework, and what roles do they play in its functionality?

[Fusion Retriever] Query 'What are the main components of LangChain?...' retrieved 4 docs

[Fusion Retriever] Query 'What are the key elements that make up LangChain?...' retrieved 4 docs

[Fusion Retriever] Query 'Can you list the primary components of LangChain? ...' retrieved 4 docs

[Fusion Retriever] Query 'What are the essential parts of the LangChain fram...' retrieved 4 docs

[Fusion Retriever] After RRF, top 3 scores:
  

## 9. Comparison: Standard vs Multi-Query vs Fusion RAG

In [13]:
# Build comparison chains
from shared.prompts import RAG_PROMPT_TEMPLATE, MULTI_QUERY_PROMPT

print_section_header("Comparison: Standard vs Multi-Query vs Fusion RAG")

# Standard RAG
standard_chain = (
    {"context": retriever | format_docs, "input": RunnablePassthrough()}
    | RAG_PROMPT_TEMPLATE
    | llm
    | StrOutputParser()
)

# Multi-Query RAG (simple deduplication)
def multi_query_retriever_simple(query: str) -> List[Document]:
    """Multi-Query RAG with simple deduplication."""
    # Generate alternatives
    multi_query_chain = MULTI_QUERY_PROMPT | llm | StrOutputParser()
    alternatives = multi_query_chain.invoke({"question": query})
    alternative_queries = [
        line.strip() for line in alternatives.split("\n") if line.strip()
    ]
    
    # Retrieve for each query
    all_docs = []
    seen_content = set()
    
    for q in [query] + alternative_queries[:2]:
        docs = retriever.invoke(q)
        for doc in docs:
            if doc.page_content not in seen_content:
                all_docs.append(doc)
                seen_content.add(doc.page_content)
    
    return all_docs[:6]

multi_query_chain = (
    {
        "context": RunnableLambda(multi_query_retriever_simple) | format_docs,
        "input": RunnablePassthrough(),
    }
    | RAG_PROMPT_TEMPLATE
    | llm
    | StrOutputParser()
)

# Test query
test_query = "How do I build chains in LangChain?"

print(f"\nQuery: {test_query}\n")
print("=" * 80)

# Standard RAG
print("\n[1] STANDARD RAG")
print("-" * 80)
start = time.time()
response_standard = standard_chain.invoke(test_query)
time_standard = time.time() - start
print(response_standard)
print(f"\n⏱️  Time: {time_standard:.2f}s")

# Multi-Query RAG
print("\n" + "=" * 80)
print("\n[2] MULTI-QUERY RAG (Simple Deduplication)")
print("-" * 80)
start = time.time()
response_multi = multi_query_chain.invoke(test_query)
time_multi = time.time() - start
print(response_multi)
print(f"\n⏱️  Time: {time_multi:.2f}s")

# Fusion RAG
print("\n" + "=" * 80)
print("\n[3] FUSION RAG (Reciprocal Rank Fusion)")
print("-" * 80)
start = time.time()
response_fusion = fusion_rag_chain.invoke(test_query)
time_fusion = time.time() - start
print(response_fusion)
print(f"\n⏱️  Time: {time_fusion:.2f}s")

# Summary
print("\n" + "=" * 80)
print("PERFORMANCE SUMMARY:")
print("=" * 80)
print(f"Standard RAG:     {time_standard:.2f}s (baseline)")
print(f"Multi-Query RAG:  {time_multi:.2f}s ({time_multi/time_standard:.1f}x slower)")
print(f"Fusion RAG:       {time_fusion:.2f}s ({time_fusion/time_standard:.1f}x slower)")


COMPARISON: STANDARD VS MULTI-QUERY VS FUSION RAG


Query: How do I build chains in LangChain?


[1] STANDARD RAG
--------------------------------------------------------------------------------
The context provided does not contain specific information on how to build chains in LangChain. It mentions various components and features of LangChain, such as RAG agents and the integration with LangGraph, but does not detail the process for building chains. You may need to refer to the official LangChain documentation or tutorials for detailed instructions on building chains.

⏱️  Time: 4.28s


[2] MULTI-QUERY RAG (Simple Deduplication)
--------------------------------------------------------------------------------
The context provided does not contain specific information on how to build chains in LangChain. It mentions that LangChain allows for building agents and applications powered by LLMs and refers to various components and features, but it does not detail the steps or methods for 

## 10. RRF Parameter Tuning

Explore how different RRF parameters affect ranking.

In [10]:
print_section_header("RRF Parameter Tuning")

query = "What is LCEL?"
print(f"Query: {query}\n")

# Test different k values
k_values = [10, 60, 100]

print("Testing different RRF k values:")
print("=" * 80)

for k in k_values:
    print(f"\nk = {k}:")
    print("-" * 80)
    
    docs = fusion_retriever(
        query=query,
        retriever=retriever,
        llm=llm,
        num_queries=3,
        k_final=3,
        rrf_k=k,
        verbose=False,
    )
    
    for i, doc in enumerate(docs, 1):
        print(f"  {i}. {doc.page_content[:80]}...")

print("\n" + "=" * 80)
print("OBSERVATIONS:")
print("=" * 80)
print("• Lower k (e.g., 10): More weight on top-ranked documents")
print("• Higher k (e.g., 100): More uniform weighting across ranks")
print("• Standard k = 60: Good balance (recommended in literature)")
print("• Results often similar due to robust algorithm")


RRF PARAMETER TUNING

Query: What is LCEL?

Testing different RRF k values:

k = 10:
--------------------------------------------------------------------------------
  1. ✅ Benefits⚠️ Drawbacks
Search only when needed – The LLM can handle greetings, f...
  2. ​ Core benefits
Standard model interfaceDifferent providers have unique APIs for...
  3. LangChain overview - Docs by LangChainSkip to main contentWe've raised a $125M S...

k = 60:
--------------------------------------------------------------------------------
  1. ✅ Benefits⚠️ Drawbacks
Search only when needed – The LLM can handle greetings, f...
  2. ​ Core benefits
Standard model interfaceDifferent providers have unique APIs for...
  3. LangChain overview - Docs by LangChainSkip to main contentWe've raised a $125M S...

k = 100:
--------------------------------------------------------------------------------
  1. ✅ Benefits⚠️ Drawbacks
Search only when needed – The LLM can handle greetings, f...
  2. ​ Core benefits
Standard

## 11. Performance Metrics

In [11]:
print_section_header("Performance Metrics")

num_queries_generated = 3  # Alternative queries (+ 1 original = 4 total)
retrievals_per_query = 4
total_retrievals = 4  # num_queries

print("\nCOST BREAKDOWN:")
print("-" * 80)
print(f"Query generation: {num_queries_generated} LLM calls")
print(f"Retrievals: {total_retrievals} vector searches")
print(f"Documents retrieved (total): ~{total_retrievals * retrievals_per_query}")
print(f"Documents after RRF: 6 (deduplicated + reranked)")
print(f"Final generation: 1 LLM call")
print(f"\nTotal LLM calls: {num_queries_generated + 1}")
print(f"Total vector searches: {total_retrievals}")

print("\n" + "=" * 80)
print("COMPARISON WITH OTHER APPROACHES:")
print("=" * 80)
print("\nLLM Calls:")
print("  • Standard RAG: 1 (generation only)")
print("  • Multi-Query RAG: 4 (1 gen + 3 query gen)")
print("  • Fusion RAG: 4 (1 gen + 3 query gen)")
print("\nVector Searches:")
print("  • Standard RAG: 1")
print("  • Multi-Query RAG: 3-4")
print("  • Fusion RAG: 4")
print("\nRanking Quality:")
print("  • Standard RAG: ⭐⭐⭐ (baseline)")
print("  • Multi-Query RAG: ⭐⭐⭐⭐ (better coverage)")
print("  • Fusion RAG: ⭐⭐⭐⭐⭐ (best ranking)")


PERFORMANCE METRICS


COST BREAKDOWN:
--------------------------------------------------------------------------------
Query generation: 3 LLM calls
Retrievals: 4 vector searches
Documents retrieved (total): ~16
Documents after RRF: 6 (deduplicated + reranked)
Final generation: 1 LLM call

Total LLM calls: 4
Total vector searches: 4

COMPARISON WITH OTHER APPROACHES:

LLM Calls:
  • Standard RAG: 1 (generation only)
  • Multi-Query RAG: 4 (1 gen + 3 query gen)
  • Fusion RAG: 4 (1 gen + 3 query gen)

Vector Searches:
  • Standard RAG: 1
  • Multi-Query RAG: 3-4
  • Fusion RAG: 4

Ranking Quality:
  • Standard RAG: ⭐⭐⭐ (baseline)
  • Multi-Query RAG: ⭐⭐⭐⭐ (better coverage)
  • Fusion RAG: ⭐⭐⭐⭐⭐ (best ranking)


## 12. Key Takeaways

### Summary

**Fusion RAG** improves on Multi-Query RAG by using Reciprocal Rank Fusion:
- Generates multiple query variations (like Multi-Query)
- Applies sophisticated RRF algorithm for ranking
- Preserves and combines ranking information
- More robust than simple deduplication

### RRF Algorithm Benefits

✅ **Why RRF works:**
- Favors documents appearing in multiple results
- Balances frequency and rank position
- No score normalization needed
- Robust to outliers and variations
- Proven effective in information retrieval

### Cost-Benefit Analysis

| Aspect | Impact | Notes |
|--------|--------|-------|
| **Query Latency** | ❌ 2-3x slower | Multiple retrievals |
| **LLM Calls** | ❌ +3-4 calls | Query generation |
| **Retrieval Quality** | ✅ Best | RRF ranking |
| **Coverage** | ✅ Excellent | Multiple perspectives |
| **Robustness** | ✅ High | Less sensitive to wording |
| **Implementation** | ⚠️ Medium | More complex |

### Best Practices

1. **Query count**: 3-5 queries is optimal (more = diminishing returns)
2. **RRF constant (k)**: 60 is standard, tune for your use case
3. **Final k**: Return more docs than needed, let RRF decide quality
4. **Caching**: Cache query generations for common queries
5. **Monitoring**: Track which queries benefit most from fusion

### When to Use

Choose **Fusion RAG** when:
- ✅ Query quality matters more than speed
- ✅ Dealing with complex, multi-faceted questions
- ✅ Need robust retrieval across query variations
- ✅ Can afford extra latency and cost

Choose **Multi-Query RAG** when:
- ✅ Simpler implementation preferred
- ✅ Deduplication is sufficient
- ✅ Slightly lower costs desired

Choose **Standard RAG** when:
- ✅ Speed is critical
- ✅ Simple queries
- ✅ Tight budget constraints

### Extensions

- **Hybrid fusion**: Combine with keyword search
- **Weighted queries**: Assign different weights to query types
- **Query filtering**: Remove low-quality generated queries
- **Adaptive fusion**: Use fusion only for complex queries

---

**Complexity Rating:** ⭐⭐⭐ (Medium - RRF algorithm adds sophistication)

**Production Readiness:** ⭐⭐⭐⭐ (High - proven algorithm, manageable trade-offs)

Continue to **14_sql_rag.ipynb** for Natural Language to SQL RAG!