# Document Reranking in RAG Systems

Re-ranking is a second-stage filtering process in retrieval systems, especially in RAG pipelines, where we:

1. **First Stage**: Use a fast retriever (like BM25, FAISS, hybrid) to fetch top-k documents quickly.

2. **Second Stage**: Use a more accurate but slower model (like a cross-encoder or LLM) to re-score and reorder those documents by relevance to the query.

👉 **Benefits**: Ensures that the most relevant documents appear at the top, improving the final answer quality from the LLM while maintaining retrieval speed.

This notebook demonstrates how to implement LLM-based reranking using Google's Gemini model.

In [1]:
"""
Import necessary libraries for document processing, retrieval, and LLM interaction.

Libraries used:
- langchain: For document processing and LLM integration
- langchain_google_genai: For Google's Gemini models and embeddings
"""

from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain_core.output_parsers import StrOutputParser 

from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
"""
STEP 1: Document Loading and Preprocessing
Load the source document and split it into manageable chunks for retrieval.
"""

# Load text file containing LangChain documentation
loader = TextLoader("langchain-sample.txt")
raw_docs = loader.load()

# Split text into smaller chunks for better retrieval performance
# chunk_size=500: Each chunk contains ~500 characters
# chunk_overlap=50: Adjacent chunks share 50 characters to maintain context
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(raw_docs)

print(f"✓ Loaded and split document into {len(docs)} chunks")
docs

[Document(metadata={'source': 'langchain-sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(metadata={'source': 'langchain-sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.'),
 Document(metadata={'source': 'langchain-sample.txt'}, page_content='Retrieval-Augmented Generation (RAG) is a powerful technique where external knowledge is retrieved and passed into the prompt to ground LLM responses. LangChain makes it easy to implement RAG using vector databases like FAISS, Chroma, and Pinecone.\nBM25 is a traditional 

In [None]:
"""
STEP 2: Define the User Query
This is the question we want to find relevant documents for.
"""

# Example query about LangChain application development
query = "How can i use langchain to build an application with memory and tools?"
print(f"User Query: {query}")

In [4]:
"""
STEP 3: Environment Setup
Load API credentials securely from environment variables.
"""

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Get Google API key with error handling
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
if not GOOGLE_API_KEY:
    raise ValueError("GOOGLE_API_KEY not found in environment variables")

print("✓ API keys loaded successfully")

✓ API keys loaded successfully


In [None]:
"""
STEP 4: Initialize the Reranking LLM
Set up Google's Gemini model that will be used for document reranking.
"""

# Initialize Google's Gemini model for response generation
llm = ChatGoogleGenerativeAI(
    google_api_key=GOOGLE_API_KEY,
    model="gemini-2.0-flash",    # Latest fast Gemini model for quick reranking
    temperature=0,               # Deterministic output for consistency in ranking
    max_tokens=None,            # Use model default token limit
    timeout=None,               # No timeout limit for ranking requests
    max_retries=2,              # Retry failed requests twice
)

print("✓ Gemini model initialized for reranking")

In [None]:
"""
STEP 5: Create Vector Store and Initial Retriever
Set up the first-stage retriever using Google embeddings and FAISS vector store.
"""

from langchain_community.vectorstores import FAISS
import os
from dotenv import load_dotenv
load_dotenv()

# Initialize Google embeddings model for vector representation
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/gemini-embedding-001", 
    google_api_key=GOOGLE_API_KEY
)

# Create FAISS vector store from document chunks
vectorstore_google = FAISS.from_documents(docs, embeddings)

# Configure retriever to fetch top 8 documents initially
# This gives us more candidates for the reranking stage
retriever_google = vectorstore_google.as_retriever(search_kwargs={"k": 8})

print("✓ Vector store created and retriever configured")

In [16]:
"""
Display the configured retriever object for verification.
"""
retriever_google

VectorStoreRetriever(tags=['FAISS', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7d012c043560>, search_kwargs={'k': 8})

In [None]:
"""
STEP 6: Create Reranking Prompt Template
Define the prompt that will instruct the LLM how to rank documents by relevance.
"""

# Prompt template for document reranking
prompt = PromptTemplate.from_template("""
You are a helpful assistant. Your task is to rank the following documents from most to least relevant to the user's question.

User Question: "{question}"

Documents:
{documents}

Instructions:
- Analyze each document's content and its relevance to the user's question
- Consider semantic similarity, topic alignment, and information completeness
- Return a list of document indices in ranked order, starting from the most relevant
- Only include indices that correspond to actual documents

Output format: comma-separated document indices (e.g., 2,1,3,0,...)
""")

print("✓ Reranking prompt template created")

In [18]:
"""
STEP 7: Perform Initial Retrieval
Use the vector store retriever to get the initial set of candidate documents.
"""

# Retrieve top-k documents using semantic similarity search
retrieved_docs = retriever_google.invoke(query)

print(f"✓ Retrieved {len(retrieved_docs)} initial documents")
retrieved_docs

[Document(id='32f87b94-2fc7-4950-9bc3-7fa11ca1ec0b', metadata={'source': 'langchain-sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.'),
 Document(id='59dd29ba-04da-444e-ae07-ac0f037d29af', metadata={'source': 'langchain-sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='1eb5b4d9-5be9-4949-9233-44764a1dba8e', metadata={'source': 'langchain-sample.txt'}, page_content='FAISS is a popular library used for fast approximate nearest neighbor search 

In [19]:
"""
Create the reranking chain by combining prompt, LLM, and output parser.
"""

# Chain: Prompt → LLM → String Output Parser
chain = prompt | llm | StrOutputParser()

print("✓ Reranking chain created")
chain

PromptTemplate(input_variables=['documents', 'question'], input_types={}, partial_variables={}, template='\nYou are a helpful assistant. Your task is to rank the following documents from most to least relevant to the user\'s question.\n\nUser Question: "{question}"\n\nDocuments:\n{documents}\n\nInstructions:\n- Think about the relevance of each document to the user\'s question.\n- Return a list of document indices in ranked order, starting from the most relevant.\n\nOutput format: comma-separated document indices (e.g., 2,1,3,0,...)\n')
| ChatGoogleGenerativeAI(model='models/gemini-2.0-flash', google_api_key=SecretStr('**********'), temperature=0.0, max_retries=2, client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7d012c068c80>, default_metadata=())
| StrOutputParser()

In [None]:
"""
STEP 8: Format Documents for Reranking
Prepare the retrieved documents in a numbered format for the LLM to rank.
"""

# Format documents with indices for the reranking prompt
# Each document gets a number starting from 1
doc_lines = [f"{i+1}. {doc.page_content}" for i, doc in enumerate(retrieved_docs)]
formatted_docs = "\n".join(doc_lines)

print("✓ Documents formatted for reranking")

In [21]:
"""
Display the formatted document lines for verification.
"""
doc_lines

['1. LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.',
 '2. LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.',
 '3. FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spaces. It supports both flat and compressed indexes, which makes it scalable for large document stores.\nAgents in LangChain are chains that use LLMs to decide which tools to use and in what order. This makes them suitable for multi-step tasks like question answering with search and code execution.',


In [23]:
"""
Display the complete formatted documents string that will be sent to the LLM.
"""
print("📄 Formatted Documents for Reranking:")
print("=" * 50)
print(formatted_docs)

1. LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.
Memory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.
2. LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.
3. FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spaces. It supports both flat and compressed indexes, which makes it scalable for large document stores.
Agents in LangChain are chains that use LLMs to decide which tools to use and in what order. This makes them suitable for multi-step tasks like question answering with search and code execution.
4. LangChain i

In [24]:
"""
STEP 9: Execute Reranking
Send the query and documents to the LLM for reranking.
"""

# Invoke the reranking chain with the query and formatted documents
response = chain.invoke({
    "question": query,
    "documents": formatted_docs
})

print("✓ Reranking completed")
print(f"LLM Response: {response}")
response

'2, 1, 3, 5, 4, 6'

In [25]:
"""
STEP 10: Parse Reranking Response
Extract and validate the document indices from the LLM's response.
"""

# Parse the comma-separated indices and convert to 0-based indexing
# LLM returns 1-based indices, we need 0-based for Python list access
indices = [int(x.strip()) - 1 for x in response.split(",") if x.strip().isdigit()]

print(f"✓ Parsed reranked indices: {indices}")
indices

[1, 0, 2, 4, 3, 5]

In [26]:
"""
Display original retrieved documents for comparison.
"""
print("📋 Original Retrieved Documents:")
for i, doc in enumerate(retrieved_docs):
    print(f"\nDocument {i}: {doc.page_content[:100]}...")
    
retrieved_docs

[Document(id='32f87b94-2fc7-4950-9bc3-7fa11ca1ec0b', metadata={'source': 'langchain-sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.'),
 Document(id='59dd29ba-04da-444e-ae07-ac0f037d29af', metadata={'source': 'langchain-sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='1eb5b4d9-5be9-4949-9233-44764a1dba8e', metadata={'source': 'langchain-sample.txt'}, page_content='FAISS is a popular library used for fast approximate nearest neighbor search 

In [27]:
"""
STEP 11: Apply Reranking Results
Reorder the documents based on the LLM's ranking with validation.
"""

# Reorder documents according to the LLM's ranking
# Only include valid indices to avoid index errors
reranked_docs = [retrieved_docs[i] for i in indices if 0 <= i < len(retrieved_docs)]

print(f"✓ Successfully reranked {len(reranked_docs)} documents")
reranked_docs

[Document(id='59dd29ba-04da-444e-ae07-ac0f037d29af', metadata={'source': 'langchain-sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='32f87b94-2fc7-4950-9bc3-7fa11ca1ec0b', metadata={'source': 'langchain-sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.'),
 Document(id='1eb5b4d9-5be9-4949-9233-44764a1dba8e', metadata={'source': 'langchain-sample.txt'}, page_content='FAISS is a popular library used for fast approximate nearest neighbor search 

In [28]:
"""
STEP 12: Display Final Results
Show the reranked documents in their new relevance order.
"""

# Display the final reranked results
print("\n🎯 RERANKING RESULTS")
print("=" * 60)
print(f"Query: {query}")
print(f"Original documents retrieved: {len(retrieved_docs)}")
print(f"Successfully reranked: {len(reranked_docs)}")
print("\n📊 Final Reranked Documents (Most to Least Relevant):")

for i, doc in enumerate(reranked_docs, 1):
    print(f"\n{'='*10} RANK {i} {'='*10}")
    print(f"Content: {doc.page_content}")
    if hasattr(doc, 'metadata') and doc.metadata:
        print(f"Metadata: {doc.metadata}")


📊 Final Reranked Results:

Rank 1:
LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.

Rank 2:
LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.
Memory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.

Rank 3:
FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spaces. It supports both flat and compressed indexes, which makes it scalable for large document stores.
Agents in LangChain are chains that use LLMs to decide which tools to use and in what order. This makes them suitable for multi-step tasks like question answering w

In [None]:
"""
SUMMARY: Document Reranking Pipeline Complete

This notebook demonstrates a complete document reranking pipeline:

1. ✅ Document Loading & Chunking
2. ✅ Vector Store Creation (FAISS + Google Embeddings)
3. ✅ Initial Retrieval (Top-K semantic search)
4. ✅ LLM-based Reranking (Gemini model)
5. ✅ Result Validation & Reordering

Key Benefits:
- Improved relevance ranking using LLM understanding
- Two-stage retrieval for speed + accuracy balance
- Robust error handling for production use

Next Steps:
- Integrate reranked documents into your RAG pipeline
- Experiment with different reranking models (cross-encoders)
- Add evaluation metrics to measure ranking quality
"""

print("✅ Document Reranking Pipeline Complete!")
print("Ready to integrate into your RAG system.")