# RAG Reranking with ColBERT: Improving Retrieval Quality
**Inspired by:** [RAGatouille Reranking Example](https://github.com/AnswerDotAI/RAGatouille/blob/main/examples/04-reranking.ipynb)

In [None]:
!pip install -q ragatouille numpy sentence-transformers pandas matplotlib

## 1. Setup: Load ColBERT Model for Reranking

In [None]:
from ragatouille import RAGPretrainedModel

# Load pre-trained ColBERT v2.0 model
# This model will be used for reranking (not indexing)
RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")

print("ColBERT model loaded successfully for reranking!")

## 3. Simulating an Existing RAG Pipeline

In [None]:
from sentence_transformers import SentenceTransformer
from voyager import Index, Space


class ExistingRetrievalPipeline:
    """Simulates a typical first-stage RAG retrieval pipeline using bi-encoder embeddings."""

    def __init__(self, embedder_name: str = "BAAI/bge-small-en-v1.5"):
        # Initialize the sentence transformer model
        self.embedder = SentenceTransformer(embedder_name)

        # Map to store original document content by index ID
        self.collection_map = {}

        # Create a vector index using cosine similarity
        self.index = Index(
            Space.Cosine,
            num_dimensions=self.embedder.get_sentence_embedding_dimension(),
        )

        print(f"Initialized retrieval pipeline with {embedder_name}")
        print(
            f"Embedding dimension: {self.embedder.get_sentence_embedding_dimension()}"
        )

    def index_documents(self, documents: list[dict]) -> None:
        """Index a list of documents into the vector store."""
        for document in documents:
            # Encode document content to vector embedding
            embedding = self.embedder.encode(document["content"])

            # Add to index and store mapping
            idx = self.index.add_item(embedding)
            self.collection_map[idx] = document["content"]

        print(f"Indexed {len(documents)} document chunks")

    def query(self, query: str, k: int = 10) -> list[str]:
        """Retrieve top-k most similar documents for a given query."""
        # Encode query to vector
        query_embedding = self.embedder.encode(query)

        # Search index for k nearest neighbors
        neighbor_ids = self.index.query(query_embedding, k=k)[0]

        # Retrieve original document content
        results = [self.collection_map[idx] for idx in neighbor_ids]

        return results

In [None]:
# Initialize the pipeline
retrieval_pipeline = ExistingRetrievalPipeline()

## 4. Prepare and Index Documents

We'll load a text file about climate change and agriculture, then chunk it into smaller pieces suitable for retrieval.

In [None]:
from ragatouille.data import CorpusProcessor

# Initialize the corpus processor
corpus_processor = CorpusProcessor()

# Load the document
with open("text_for_reranking.txt", "r") as file:
    content = file.read()

print(f"Loaded document with {len(content)} characters")
print("\n--- Document Preview ---")
print(content[:300] + "...\n")

# Process and chunk the document
# chunk_size=100 tokens creates manageable pieces for retrieval
documents = corpus_processor.process_corpus([content], chunk_size=100)

print(f"Created {len(documents)} chunks from the document")
print("\n--- Example Chunk ---")
print(documents[0])

In [None]:
# Index the documents in our retrieval pipeline
retrieval_pipeline.index_documents(documents)

## 5. First-Stage Retrieval (Without Reranking)

Let's query our pipeline with a complex question about climate change and agriculture.

In [None]:
# Define our test query
query = "How does climate change affect agriculture, and what are the impacts on crop yields, farming practices, and food security?"

print(f"Query: {query}\n")

# Retrieve top 7 results using bi-encoder
raw_results = retrieval_pipeline.query(query, k=7)

print("=" * 80)
print("FIRST-STAGE RETRIEVAL RESULTS (Bi-Encoder)")
print("=" * 80)

for i, result in enumerate(raw_results, 1):
    print(f"\n[Rank {i}]")
    print(result)
    print("-" * 80)

### Analysis of First-Stage Results

Notice that while the results are generally relevant, they may not be optimally ordered. Some highly relevant passages might appear lower in the ranking due to the limitations of bi-encoder similarity matching.

**Key observation:** Look for passages that directly answer the question about crop yields and food security. Are they in the top positions?

## 6. Second-Stage Reranking with ColBERT

Now let's use ColBERT to rerank these results. ColBERT uses a more sophisticated scoring mechanism that considers fine-grained token-level interactions between the query and documents.

In [None]:
# Rerank the results using ColBERT
# We'll return top 5 after reranking
reranked_results = RAG.rerank(query=query, documents=raw_results, k=7)

print("=" * 80)
print("RERANKED RESULTS (ColBERT)")
print("=" * 80)

for result in reranked_results:
    print(f"\n[Rank {result['rank']+1}] (Score: {result['score']:.4f})")
    print(result["content"])
    print("-" * 80)

## 7. Side-by-Side Comparison

Let's compare the top 5 results before and after reranking to see the improvement.

In [None]:
import pandas as pd

# Prepare comparison data
comparison_data = []

for i in range(7):
    comparison_data.append(
        {
            "Rank": i + 1,
            "Before Reranking (Bi-Encoder)": raw_results[i][:150] + "...",
            "After Reranking (ColBERT)": reranked_results[i]["content"][:150] + "...",
            "ColBERT Score": f"{reranked_results[i]['score']:.4f}",
        }
    )

comparison_df = pd.DataFrame(comparison_data)

print("\n" + "=" * 120)
print("COMPARISON: Before vs After Reranking")
print("=" * 120)
print(comparison_df.to_string(index=False))

## 8. Visualizing the Impact of Reranking

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Create a visualization showing rank changes
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Left plot: ColBERT scores after reranking
ranks = [r["rank"] + 1 for r in reranked_results]
scores = [r["score"] for r in reranked_results]

ax1.bar(ranks, scores, color="steelblue", alpha=0.7)
ax1.set_xlabel("Rank After Reranking", fontsize=12)
ax1.set_ylabel("ColBERT Relevance Score", fontsize=12)
ax1.set_title(
    "Document Relevance Scores After Reranking", fontsize=14, fontweight="bold"
)
ax1.set_xticks(ranks)
ax1.grid(axis="y", alpha=0.3)

# Right plot: Rank changes visualization
# Map reranked documents back to their original positions
original_positions = []
for reranked in reranked_results:
    for i, original in enumerate(raw_results):
        if reranked["content"] == original:
            original_positions.append(i + 1)
            break

# Plot rank movement
for i, (old_pos, new_pos) in enumerate(zip(original_positions, ranks)):
    color = "green" if old_pos > new_pos else "red" if old_pos < new_pos else "gray"
    ax2.plot(
        [0, 1],
        [old_pos, new_pos],
        "o-",
        color=color,
        linewidth=2,
        markersize=8,
        alpha=0.7,
    )
    ax2.text(-0.1, old_pos, f"{old_pos}", fontsize=10, ha="right", va="center")
    ax2.text(1.1, new_pos, f"{new_pos}", fontsize=10, ha="left", va="center")

ax2.set_xlim(-0.3, 1.3)
ax2.set_ylim(0, 8)
ax2.set_xticks([0, 1])
ax2.set_xticklabels(["Before\nReranking", "After\nReranking"], fontsize=11)
ax2.set_ylabel("Rank Position", fontsize=12)
ax2.set_title("Rank Changes from Reranking", fontsize=14, fontweight="bold")
ax2.invert_yaxis()
ax2.grid(axis="y", alpha=0.3)

# Add legend
from matplotlib.lines import Line2D

legend_elements = [
    Line2D([0], [0], color="green", linewidth=2, label="Moved Up"),
    Line2D([0], [0], color="red", linewidth=2, label="Moved Down"),
    Line2D([0], [0], color="gray", linewidth=2, label="No Change"),
]
ax2.legend(handles=legend_elements, loc="upper right")

plt.tight_layout()
plt.show()

## 9. Understanding the Trade-offs

### When to Use Reranking

**Reranking is ideal when:**
- You have an existing retrieval pipeline you don't want to rebuild
- You need to improve relevance without major infrastructure changes
- Your first-stage retriever is fast but not highly accurate
- You can afford slight additional latency for better quality
- Your candidate set (k) is moderate (10-100 documents)

**Full ColBERT indexing is better when:**
- You're building a new system from scratch
- You need maximum retrieval quality
- You have resources for offline index building
- Your query volume justifies the infrastructure

### Performance Characteristics

Let's measure the computational cost of reranking:

## 10. Best Practices and Tips

### Choosing K Values

1. **First-stage K (k1)**: Retrieve more candidates than needed
   - Typical range: 20-100
   - Higher k1 gives reranker more options but increases cost
   - Rule of thumb: k1 = 2-5x your final target

2. **Reranking K (k2)**: Final number of documents
   - Typical range: 3-10
   - Depends on your LLM's context window
   - More isn't always better (noise vs. signal)