# **RAG Fusion**
RAG-Fusion is an enhanced version of the traditional Retrieval-Augmented Generation (RAG) model. In RAG-Fusion, after receiving a query, the model first generates related sub-queries using a large language model. These sub-queries help find more relevant documents. Instead of simply sending the retrieved documents to the model, RAG-Fusion uses a technique called Reciprocal Rank Fusion (RRF) to score and reorder the documents based on their relevance. The best-ranked documents are then used to generate a more accurate response.

Research Paper: [RAG Fusion](https://arxiv.org/pdf/2402.03367)

# **📚 Notebook Summary & Step-by-Step Guide**

## **🎯 What This Notebook Does**
This notebook implements **RAG Fusion** - an advanced retrieval technique that generates **multiple sub-queries** from a single user question, retrieves documents for each sub-query, then uses **Reciprocal Rank Fusion (RRF)** to intelligently combine and rank results for optimal context selection.

## **🔧 Key Libraries & Their Roles**

### **Core RAG Libraries**
- **`langchain`** - Main orchestration framework for advanced RAG patterns
- **`langchain-openai`** - OpenAI integration for embeddings and chat
- **`langchain-community`** - Extended components including Qdrant integration
- **`athina`** - RAG evaluation and monitoring framework

### **Vector Database**
- **`Qdrant`** - Production-grade vector database with gRPC support
- **`OpenAIEmbeddings`** - Text-to-vector conversion for semantic search

### **Monitoring & Evaluation**
- **`langsmith`** - LangChain monitoring and debugging platform
- **`Athina`** - RAG pipeline evaluation and quality assessment

## **📋 Step-by-Step Process**

### **Step 1: Advanced Setup**
- Install fusion-specific packages: `athina`, `langsmith`
- Configure multiple API keys: OpenAI, Athina, Qdrant
- Set up production-grade vector database connection

### **Step 2: Document Processing**
- Load CSV data using standard `CSVLoader`
- Split documents into 500-character chunks
- Prepare documents for multi-query retrieval

### **Step 3: Qdrant Vector Database Setup**
- Create production Qdrant vectorstore with gRPC protocol
- Configure cloud-based vector storage for scalability
- Index documents with OpenAI embeddings

### **Step 4: Multi-Query Generation**
- **Query Expansion**: Generate 3-5 related sub-queries from original question
- **Perspective Diversification**: Create queries from different angles
- **Coverage Enhancement**: Ensure comprehensive topic exploration

### **Step 5: Parallel Retrieval**
- Execute retrieval for each generated sub-query
- Collect multiple sets of relevant documents
- Maintain query-document relationship mapping

### **Step 6: Reciprocal Rank Fusion (RRF)**
- **Rank Documents**: Score documents from each query independently
- **Fusion Algorithm**: Combine rankings using RRF mathematical formula
- **Final Ranking**: Produce unified ranking of most relevant documents

### **Step 7: Enhanced Response Generation**
- Use top-ranked documents as context
- Generate response with improved document relevance
- Leverage diverse perspectives from multi-query approach

## **🚀 Key Advantages of RAG Fusion**

### **Query Understanding Enhancement**
- **Multi-Perspective**: Approaches questions from different angles
- **Comprehensive Coverage**: Reduces chance of missing relevant information
- **Query Robustness**: Handles ambiguous or complex questions better

### **Retrieval Quality Improvement**
- **Diverse Results**: Multiple queries find different relevant documents
- **Ranking Fusion**: RRF combines rankings better than simple concatenation
- **Relevance Optimization**: Mathematical fusion improves final document selection

### **Real-World Benefits**
- **Better Recall**: Finds more relevant documents across different query formulations
- **Improved Precision**: RRF ranking reduces noise in final context
- **Production Ready**: Uses enterprise-grade Qdrant for scalability

## **🧮 Reciprocal Rank Fusion (RRF) Formula**

```
RRF_score(doc) = Σ(1 / (k + rank_in_query_i))
```

Where:
- **k**: Smoothing constant (typically 60)
- **rank_in_query_i**: Document's rank in results for query i
- **Σ**: Sum across all queries where document appears

## **📊 Fusion Workflow Comparison**

| Stage | Traditional RAG | RAG Fusion | Advantage |
|-------|----------------|------------|-----------|
| **Query Processing** | Single query | 3-5 sub-queries | Better coverage |
| **Retrieval** | One search | Multiple searches | Diverse results |
| **Ranking** | Single ranking | RRF fusion | Optimal ordering |
| **Context Quality** | Good | Excellent | Multi-perspective |

## **🏗️ Architecture Pattern**

```
User Query → Query Generation → Multi-Retrieval → RRF Fusion → LLM Response
     ↓              ↓               ↓              ↓
Single Question → Sub-Queries → Document Sets → Ranked Docs → Final Answer
```

## **💡 Learning Outcomes**
Students will understand:
- Advanced query expansion and diversification techniques
- Mathematical ranking fusion algorithms (RRF)
- Multi-perspective document retrieval strategies
- Production-grade vector database integration (Qdrant)
- Enterprise RAG architecture patterns
- Performance monitoring with LangSmith
- Quality evaluation strategies for complex RAG systems

## **Initial Setup**

In [None]:
! pip install --q athina langsmith

In [None]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ['ATHINA_API_KEY'] = userdata.get('ATHINA_API_KEY')
os.environ['QDRANT_API_KEY'] = userdata.get('QDRANT_API_KEY')

## **Indexing**

In [None]:
# load embedding model
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [None]:
# load data
from langchain.document_loaders import CSVLoader
loader = CSVLoader("./context.csv")
documents = loader.load()

In [None]:
# split documents
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

## **Qdrant Vector Database**

In [None]:
# create vectorstore
from langchain_community.vectorstores import Qdrant
vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    url="your_qdrant_url",
    prefer_grpc=True,
    collection_name="documents",
    api_key=os.environ["QDRANT_API_KEY"],
)

## **Chromadb (Optional)**

In [None]:
# # optional vectorstore
# !pip install chromadb
# # create vectorstore
# from langchain.vectorstores import Chroma
# vectorstore = Chroma.from_documents(documents, embeddings)

## **Retriever**

In [None]:
# create retriever
retriever = vectorstore.as_retriever()

## **Reciprocal Rank Fusion Chain**

In [None]:
# create llm
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()

In [None]:
# create chain
from langchain_core.output_parsers import StrOutputParser
from langsmith import Client
client = Client()
prompt = client.pull_prompt("langchain-ai/rag-fusion-query-generation")

In [None]:
# generate queries
generate_queries = (
    prompt | ChatOpenAI(temperature=0) | StrOutputParser() | (lambda x: x.split("\n"))
)

In [None]:
# rerank results
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    fused_scores = {}
    for docs in results:
        # Assumes the docs are returned in sorted order of relevance
        for rank, doc in enumerate(docs):
            doc_str = dumps(doc)
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            previous_score = fused_scores[doc_str]
            fused_scores[doc_str] += 1 / (rank + k)

    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]
    return reranked_results

In [None]:
# create chain
chain = generate_queries | retriever.map() | reciprocal_rank_fusion

In [None]:
# check input schema
chain.input_schema.schema()

  and should_run_async(code)


{'title': 'PromptInput',
 'type': 'object',
 'properties': {'original_query': {'title': 'Original Query',
   'type': 'string'}}}

In [None]:
# rerank results
chain.invoke("what are points on a mortgage")

## **RAG Chain**

In [None]:
from langchain.schema.runnable import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context.
If you don't find the answer in the context, just say that you don't know.

Context: {context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

rag_fusion_chain = (
    {
        "context": chain,
        "question": RunnablePassthrough()
    }
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
rag_fusion_chain.invoke("what are points on a mortgage")

  and should_run_async(code)


'Points on a mortgage are a form of pre-paid interest that borrowers can offer to pay a lender in order to reduce the interest rate on the loan, resulting in a lower monthly payment.'

## **Preparing Data for Evaluation**

In [None]:
question = ["what are points on a mortgage"]
response = []
contexts = []
ground_truths = ["Points, sometimes also called a 'discount point', are a form of pre-paid interest."]

# Inference
for query in question:
  response.append(rag_fusion_chain.invoke(query))
  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])

# To dict
data = {
    "query": question,
    "response": response,
    "context": contexts,
    "ground_truth": ground_truths
}

In [None]:
# create dataset
from datasets import Dataset
dataset = Dataset.from_dict(data)

In [None]:
# create dataframe
import pandas as pd
df = pd.DataFrame(dataset)

In [None]:
df

  and should_run_async(code)


Unnamed: 0,query,response,context,ground_truth
0,what are points on a mortgage,Points on a mortgage are a form of pre-paid in...,"[context: [""Discount points, also called mortg...","Points, sometimes also called a 'discount poin..."


In [None]:
# Convert to dictionary
df_dict = df.to_dict(orient='records')

# Convert context to list
for record in df_dict:
    if not isinstance(record.get('context'), list):
        if record.get('context') is None:
            record['context'] = []
        else:
            record['context'] = [record['context']]

## **Evaluation in Athina AI**

We will use **Answer Relevancy** eval here. It Measures how pertinent the generated response is to the given prompt. Please refer to our [documentation](https://docs.athina.ai/api-reference/evals/preset-evals/overview) for further details.

In [None]:
# set api keys for Athina evals
from athina.keys import AthinaApiKey, OpenAiApiKey
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

In [None]:
# load dataset
from athina.loaders import Loader
dataset = Loader().load_dict(df_dict)

In [None]:
# evaluate
from athina.evals import RagasAnswerRelevancy
RagasAnswerRelevancy(model="gpt-4o").run_batch(data=dataset).to_df()

  and should_run_async(code)


evaluating with [answer_relevancy]


  0%|          | 0/1 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/pydantic/main.py:1024: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/
/usr/local/lib/python3.10/dist-packages/pydantic/main.py:1024: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/
100%|██████████| 1/1 [00:01<00:00,  1.11s/it]


You can view your dataset at: https://app.athina.ai/develop/f9377fc3-cd03-4feb-9b52-c8e747590c5b


Unnamed: 0,query,context,response,expected_response,display_name,failed,grade_reason,runtime,model,ragas_answer_relevancy
0,what are points on a mortgage,"[context: [""Discount points, also called mortgage points or simply points, are a form of pre-paid interest available in the United States when arranging a mortgage. One point equals one percent of the loan amount. By charging a borrower points, a lender effectively increases the yield on the loan above the amount of the stated interest rate. Borrowers can offer to pay a lender points as a method to reduce the interest rate on the loan, thus obtaining a lower monthly payment in exchange for ...","Points on a mortgage are a form of pre-paid interest available in the United States when arranging a mortgage. One point equals one percent of the loan amount. Borrowers can offer to pay points to reduce the interest rate on the loan, thus obtaining a lower monthly payment in exchange for this up-front payment.",,Ragas Answer Relevancy,,"A response is deemed relevant when it directly and appropriately addresses the original query. Importantly, our assessment of answer relevance does not consider factuality but instead penalizes cases where the response lacks completeness or contains redundant details",1454,gpt-4o,0.919036
