# Re-ranking for Improved Retrieval Quality

In this notebook, we explore **re-ranking** as a technique to improve the quality of retrieved documents in RAG systems.

## What is Re-ranking?

Re-ranking is a two-stage retrieval approach:

1. **First Stage (Fast Retrieval)**: Use a fast embedding model (like sentence transformers) to retrieve a larger set of candidate documents (e.g., top-20 or top-50)
2. **Second Stage (Precise Re-ranking)**: Use a more sophisticated model to re-score and re-order these candidates based on relevance

## Why Re-ranking?

### The Problem with Single-Stage Retrieval

- **Embedding models** are fast but may not capture complex semantic relationships
- **Cosine similarity** in vector space doesn't always reflect true relevance
- **Top-K results** may include irrelevant documents with high embedding similarity

### The Solution: Re-ranking

- **Better accuracy**: Re-ranking models (like cross-encoders) process query-document pairs jointly
- **Improved relevance**: More sophisticated models can better assess relevance
- **Better ranking**: Moves truly relevant documents to the top positions
- **Cost-effective**: Only re-rank a small set of candidates (not the entire corpus)

## Re-ranking Model

We'll use **BAAI/bge-reranker-base**, a state-of-the-art general-purpose re-ranking model:

- **Type**: Cross-encoder (sequence classification model)
- **Domain**: General-purpose (works across all domains)
- **Approach**: Direct relevance scoring for query-document pairs
- **Performance**: Top results on BEIR benchmark

## Evaluation Approach

We'll compare:
- **Baseline**: Retrieval without re-ranking (embedding similarity only)
- **With Re-ranking**: Retrieval followed by re-ranking

Using metrics:
- Token-level: IoU, Precision, Recall, F1
- Passage-level: Coverage, Accuracy, Precision, Recall, F1
- Document-level: Coverage, Precision
- Ranking quality: **Mean Reciprocal Rank (MRR)**, Rank Distribution

## Setup Environment

In [None]:
# Import necessary libraries
import sys
import os
import json
import time
import pandas as pd
import numpy as np
from typing import List, Dict, Optional, Tuple
from collections import Counter
import string

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from tqdm.auto import tqdm

sys.path.append(os.path.join(os.getcwd(), '..'))

from dotenv import load_dotenv
load_dotenv()

# Configure Qdrant connection
QDRANT_URL = os.getenv("QDRANT_URL")
QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")

if not QDRANT_URL or not QDRANT_API_KEY:
    raise ValueError(
        "Qdrant credentials not found!\n"
        "Please set QDRANT_URL and QDRANT_API_KEY in your .env file."
    )

collection_name = "sample_collection.snapshot"

print(f"✓ Environment configured")
print(f"  Qdrant URL: {QDRANT_URL}")
print(f"  Collection: {collection_name}")

## Connect to Qdrant

In [None]:
try:
    qdrant_client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)
    info = qdrant_client.get_collection(collection_name)
    print(f"✓ Connected to Qdrant")
    print(f"  Collection: {collection_name}")
    print(f"  Points: {info.points_count}")
    print(f"  Status: {info.status}")
except Exception as e:
    raise Exception(f"Failed to connect to Qdrant: {e}")

## Initialize Embedding Model

In [None]:
embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
print(f"✓ Embedding model loaded: all-MiniLM-L6-v2")

## Re-ranking Model Implementation

We'll use **BAAI/bge-reranker-base**, a state-of-the-art general-purpose cross-encoder reranking model.

### How it Works

A cross-encoder:
1. Takes a query-document pair as input
2. Jointly processes both through transformer layers
3. Outputs a relevance score
4. Is more accurate than bi-encoders (separate query/doc embeddings) but slower
5. Perfect for re-ranking a small candidate set

### BGE Reranker Advantages

- **State-of-the-art**: Among the best performing rerankers on BEIR benchmark
- **General-purpose**: Works well across diverse domains and tasks
- **Efficient**: ~280MB model size, similar to domain-specific alternatives
- **Production-ready**: Widely used in industry applications
- **Practical**: Can run locally without massive GPU requirements

In [None]:
class Reranker:
    """BGE Reranker implementation - state-of-the-art general-purpose reranking."""
    
    def __init__(self, model_name="BAAI/bge-reranker-base", device=None, max_length=512):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
        self.device = device if device else ("cuda" if torch.cuda.is_available() else "cpu")
        self.model.to(self.device)
        self.model.eval()
        self.max_length = max_length

    def rerank(self, query: str, docs: List[str]) -> Tuple[List[str], List[float]]:
        """
        Rerank documents for a query.
        
        Args:
            query: The query string
            docs: List of documents to rerank
        
        Returns:
            Tuple of (sorted_docs, scores)
        """
        # Tokenize query-doc pairs
        encodings = self.tokenizer(
            text=[query] * len(docs),
            text_pair=docs,
            padding=True,
            truncation=True,
            max_length=self.max_length,
            return_tensors="pt"
        ).to(self.device)

        # Compute relevance scores
        with torch.no_grad():
            logits = self.model(**encodings).logits
            # BGE reranker outputs single logit per pair
            scores = logits.squeeze(-1).cpu().tolist()

        # Sort by score descending
        sorted_pairs = sorted(zip(docs, scores), key=lambda x: x[1], reverse=True)
        sorted_docs, sorted_scores = zip(*sorted_pairs) if sorted_pairs else ([], [])

        return list(sorted_docs), list(sorted_scores)

print("✓ Reranker class defined")

## Initialize Re-ranker

We'll initialize the BGE reranker model.

**Note**: The first run will download the model weights (~280MB).

In [None]:
# Initialize the BGE reranker
reranker = Reranker(max_length=512)
print("✓ BGE Reranker initialized (BAAI/bge-reranker-base)")

## Load Evaluation Dataset

We'll use the same Q&A dataset from the evaluation notebook.

In [None]:
# Load evaluation dataset
qa_dataset_file = "../data/qa_evaluation_dataset.json"

if os.path.exists(qa_dataset_file):
    with open(qa_dataset_file, 'r') as f:
        qa_dataset = json.load(f)
    print(f"✓ Loaded {len(qa_dataset)} Q&A pairs from {qa_dataset_file}")
else:
    raise FileNotFoundError(
        f"Evaluation dataset not found at {qa_dataset_file}\n"
        f"Please ensure the Q&A dataset has been generated first."
    )


print(f"Using {len(qa_dataset)} Q&A pairs for evaluation")

## Evaluation Metrics

We use comprehensive metrics at multiple levels:

### Token-Level Metrics
- **IoU**: Intersection over Union of token sets
- **Precision**: Fraction of retrieved tokens that are relevant
- **Recall**: Fraction of relevant tokens that are retrieved
- **F1**: Harmonic mean of precision and recall

### Passage-Level Metrics
- **Coverage**: Fraction of reference passages found
- **Accuracy**: Binary metric (all references found or not)
- **Precision**: Fraction of retrieved chunks containing references
- **Recall**: Same as coverage
- **F1**: Harmonic mean of precision and recall

### Document-Level Metrics
- **Coverage**: Whether source document appears in results
- **Precision**: Fraction of retrieved chunks from source document

### Ranking Quality Metrics (NEW)
- **Mean Reciprocal Rank (MRR)**: Average of 1/rank for first relevant document
  - MRR = 1.0 means relevant doc at rank 1
  - MRR = 0.5 means relevant doc at rank 2
  - MRR = 0.33 means relevant doc at rank 3
- **Rank Distribution**: Where relevant documents appear in the ranking

In [None]:
def normalize_text(text: str) -> str:
    """Normalize text for comparison."""
    text = text.lower()
    text = text.translate(str.maketrans("", "", string.punctuation))
    text = " ".join(text.split())
    return text

def is_reference_present_fuzzy(reference: str, document: str, threshold: float = 0.8) -> bool:
    """Check if reference appears in document with fuzzy matching."""
    ref_tokens = normalize_text(reference).split()
    doc_tokens = normalize_text(document).split()
    if not ref_tokens:
        return False
    matched_tokens = sum(1 for t in ref_tokens if t in doc_tokens)
    fraction_matched = matched_tokens / len(ref_tokens)
    return fraction_matched >= threshold

def compute_token_metrics(references: List[str], retrieved_texts: List[str], threshold: float = 0.8) -> Dict[str, float]:
    """Compute token-level metrics."""
    all_ref_tokens = []
    all_doc_tokens = []

    for ref in references:
        found = any(is_reference_present_fuzzy(ref, doc, threshold) for doc in retrieved_texts)
        ref_tokens = normalize_text(ref).split()
        all_ref_tokens.extend(ref_tokens)
        if found:
            for doc in retrieved_texts:
                all_doc_tokens.extend(normalize_text(doc).split())
    
    if not all_ref_tokens:
        return {"iou": 0.0, "precision": 0.0, "recall": 0.0, "f1": 0.0}

    ref_counter = Counter(all_ref_tokens)
    doc_counter = Counter(all_doc_tokens)

    intersection_tokens = ref_counter & doc_counter
    intersection_count = sum(intersection_tokens.values())

    ref_count = sum(ref_counter.values())
    doc_count = sum(doc_counter.values())
    union_count = ref_count + doc_count - intersection_count

    iou = intersection_count / union_count if union_count > 0 else 0.0
    precision = intersection_count / doc_count if doc_count > 0 else 0.0
    recall = intersection_count / ref_count if ref_count > 0 else 0.0
    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0.0

    return {"iou": iou, "precision": precision, "recall": recall, "f1": f1}

def compute_passage_metrics(references: List[str], retrieved_texts: List[str], threshold: float = 0.8) -> Dict[str, float]:
    """Compute passage-level metrics."""
    if not references or not retrieved_texts:
        return {"coverage": 0.0, "accuracy": 0.0, "precision": 0.0, "recall": 0.0, "f1": 0.0}
    
    found_references = sum(1 for ref in references if any(is_reference_present_fuzzy(ref, doc, threshold) for doc in retrieved_texts))
    relevant_retrieved = sum(1 for doc in retrieved_texts if any(is_reference_present_fuzzy(ref, doc, threshold) for ref in references))
    
    coverage = found_references / len(references)
    accuracy = 1.0 if found_references == len(references) else 0.0
    precision = relevant_retrieved / len(retrieved_texts) if retrieved_texts else 0.0
    recall = coverage
    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0.0
    
    return {"coverage": coverage, "accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1}

def compute_document_metrics(source_file_path: str, retrieved_file_paths: List[str]) -> Dict[str, float]:
    """Compute document-level metrics."""
    if not retrieved_file_paths:
        return {"coverage": 0.0, "precision": 0.0}
    
    source_chunks_retrieved = sum(1 for fp in retrieved_file_paths if fp == source_file_path)
    coverage = 1.0 if source_chunks_retrieved > 0 else 0.0
    precision = source_chunks_retrieved / len(retrieved_file_paths)
    
    return {"coverage": coverage, "precision": precision, "source_chunks_count": source_chunks_retrieved}

def compute_mrr(references: List[str], retrieved_texts: List[str], threshold: float = 0.8) -> Dict[str, float]:
    """Compute Mean Reciprocal Rank - measures ranking quality."""
    for rank, doc in enumerate(retrieved_texts, start=1):
        if any(is_reference_present_fuzzy(ref, doc, threshold) for ref in references):
            return {"reciprocal_rank": 1.0 / rank, "first_relevant_rank": rank}
    return {"reciprocal_rank": 0.0, "first_relevant_rank": None}

print("✓ Metric functions defined")

## Retrieval Function

In [None]:
def retrieve_documents(query: str, k: int = 20) -> List[Dict]:
    """Retrieve documents from Qdrant."""
    query_embedding = embedder.encode([query])[0].tolist()
    
    results = qdrant_client.query_points(
        collection_name=collection_name,
        query=query_embedding,
        limit=k,
        score_threshold=0.3
    )
    
    documents = []
    for point in results.points:
        documents.append({
            'content': point.payload.get('content', '') or point.payload.get('text', ''),
            'file_path': point.payload.get('file_path', ''),
            'score': point.score
        })
    
    return documents

print("✓ Retrieval function defined")

## Demonstration: Re-ranking in Action

Let's see how re-ranking changes the order of retrieved documents for a sample query.

In [None]:
# Take first Q&A pair for demonstration
if qa_dataset:
    demo_qa = qa_dataset[0]
    demo_query = demo_qa['question']
    
    print(f"Query: {demo_query}")
    print(f"\n{'='*80}")
    
    # Retrieve documents
    retrieved_docs = retrieve_documents(demo_query, k=10)
    
    print(f"\nRETRIEVED DOCUMENTS (Embedding-based ranking)")
    print(f"{'='*80}")
    for i, doc in enumerate(retrieved_docs[:5], 1):
        print(f"\n[{i}] Score: {doc['score']:.4f}")
        print(f"    File: {doc['file_path'].split('/')[-1] if doc['file_path'] else 'N/A'}")
        print(f"    Content: {doc['content'][:150]}...")
    
    # Re-rank
    doc_contents = [d['content'] for d in retrieved_docs]
    reranked_contents, reranked_scores = reranker.rerank(demo_query, doc_contents)
    
    # Map back to original documents and track original positions
    content_to_doc = {d['content']: d for d in retrieved_docs}
    content_to_original_pos = {d['content']: idx + 1 for idx, d in enumerate(retrieved_docs)}
    reranked_docs = [content_to_doc[content] for content in reranked_contents]
    
    print(f"\n\nRE-RANKED DOCUMENTS (Re-ranker scores)")
    print(f"{'='*80}")
    for i, (doc, score) in enumerate(zip(reranked_docs[:5], reranked_scores[:5]), 1):
        original_pos = content_to_original_pos[doc['content']]
        position_change = original_pos - i
        
        if position_change > 0:
            change_indicator = f"↑ +{position_change}"
        elif position_change < 0:
            change_indicator = f"↓ {position_change}"
        else:
            change_indicator = "="
        
        print(f"\n[{i}] Position: #{original_pos} → #{i} ({change_indicator})")
        print(f"    Rerank Score: {score:.4f} | Original Score: {doc['score']:.4f}")
        print(f"    File: {doc['file_path'].split('/')[-1] if doc['file_path'] else 'N/A'}")
        print(f"    Content: {doc['content'][:150]}...")
    
    print(f"\n{'='*80}")
    print("\nNotice how the ranking changes! Re-ranking often moves more relevant documents to the top.")
else:
    print("No Q&A dataset available for demonstration")

## Comprehensive Evaluation: With vs Without Re-ranking

We'll evaluate the full Q&A dataset with both approaches:

1. **Baseline**: Retrieval without re-ranking (top-K by embedding similarity)
2. **With Re-ranking**: Retrieve top-20, then re-rank to get final top-K

Configuration:
- Retrieve: Top-20 candidates
- Evaluate: Top-5 after re-ranking
- Metrics: All levels including MRR

In [None]:
# Configuration
RETRIEVE_K = 20  # Retrieve more candidates for re-ranking
EVAL_K = 5       # Evaluate top-5 after re-ranking
THRESHOLD = 0.5

results_baseline = []
results_reranked = []

print(f"Evaluating {len(qa_dataset)} Q&A pairs...")
print(f"Retrieval: top-{RETRIEVE_K}, Evaluation: top-{EVAL_K}\n")

for idx, qa_pair in enumerate(tqdm(qa_dataset, desc="Evaluating")):
    question = qa_pair["question"]
    file_path = qa_pair.get("file_path", "")
    references_info = qa_pair["references"]
    reference_texts = [ref_info["text"] for ref_info in references_info]
    
    # Retrieve documents
    retrieved_docs = retrieve_documents(question, k=RETRIEVE_K)
    
    # ========================================================================
    # BASELINE: Use top-K from retrieval directly
    # ========================================================================
    baseline_docs = retrieved_docs[:EVAL_K]
    baseline_texts = [d['content'] for d in baseline_docs]
    baseline_file_paths = [d['file_path'] for d in baseline_docs]
    
    # Compute all metrics
    token_metrics = compute_token_metrics(reference_texts, baseline_texts, threshold=THRESHOLD)
    passage_metrics = compute_passage_metrics(reference_texts, baseline_texts, threshold=THRESHOLD)
    document_metrics = compute_document_metrics(file_path, baseline_file_paths)
    mrr_metrics = compute_mrr(reference_texts, baseline_texts, threshold=THRESHOLD)
    
    results_baseline.append({
        "question": question,
        "k": EVAL_K,
        # Token-level
        "token_iou": token_metrics["iou"],
        "token_precision": token_metrics["precision"],
        "token_recall": token_metrics["recall"],
        "token_f1": token_metrics["f1"],
        # Passage-level
        "passage_coverage": passage_metrics["coverage"],
        "passage_accuracy": passage_metrics["accuracy"],
        "passage_precision": passage_metrics["precision"],
        "passage_recall": passage_metrics["recall"],
        "passage_f1": passage_metrics["f1"],
        # Document-level
        "doc_coverage": document_metrics["coverage"],
        "doc_precision": document_metrics["precision"],
        # Ranking quality (MRR)
        "reciprocal_rank": mrr_metrics["reciprocal_rank"],
        "first_relevant_rank": mrr_metrics["first_relevant_rank"]
    })
    
    # ========================================================================
    # WITH RE-RANKING: Re-rank all retrieved docs, then take top-K
    # ========================================================================
    doc_contents = [d['content'] for d in retrieved_docs]
    reranked_contents, reranked_scores = reranker.rerank(question, doc_contents)
    
    # Map back to original documents
    content_to_doc = {d['content']: d for d in retrieved_docs}
    reranked_docs = [content_to_doc[content] for content in reranked_contents]
    
    # Take top-K after re-ranking
    reranked_docs_topk = reranked_docs[:EVAL_K]
    reranked_texts = [d['content'] for d in reranked_docs_topk]
    reranked_file_paths = [d['file_path'] for d in reranked_docs_topk]
    
    # Compute all metrics
    token_metrics_rr = compute_token_metrics(reference_texts, reranked_texts, threshold=THRESHOLD)
    passage_metrics_rr = compute_passage_metrics(reference_texts, reranked_texts, threshold=THRESHOLD)
    document_metrics_rr = compute_document_metrics(file_path, reranked_file_paths)
    mrr_metrics_rr = compute_mrr(reference_texts, reranked_texts, threshold=THRESHOLD)
    
    results_reranked.append({
        "question": question,
        "k": EVAL_K,
        # Token-level
        "token_iou": token_metrics_rr["iou"],
        "token_precision": token_metrics_rr["precision"],
        "token_recall": token_metrics_rr["recall"],
        "token_f1": token_metrics_rr["f1"],
        # Passage-level
        "passage_coverage": passage_metrics_rr["coverage"],
        "passage_accuracy": passage_metrics_rr["accuracy"],
        "passage_precision": passage_metrics_rr["precision"],
        "passage_recall": passage_metrics_rr["recall"],
        "passage_f1": passage_metrics_rr["f1"],
        # Document-level
        "doc_coverage": document_metrics_rr["coverage"],
        "doc_precision": document_metrics_rr["precision"],
        # Ranking quality (MRR)
        "reciprocal_rank": mrr_metrics_rr["reciprocal_rank"],
        "first_relevant_rank": mrr_metrics_rr["first_relevant_rank"]
    })

df_baseline = pd.DataFrame(results_baseline)
df_reranked = pd.DataFrame(results_reranked)

print(f"\n✓ Evaluation complete!")

## Results Comparison

In [None]:
print(f"\n{'='*80}")
print(f"EVALUATION RESULTS @ K={EVAL_K}")
print(f"{'='*80}")

print(f"\n{'─'*80}")
print("TOKEN-LEVEL METRICS")
print(f"{'─'*80}")
print(f"{'Metric':<20} {'Baseline':>12} {'Re-ranked':>12} {'Improvement':>12}")
print(f"{'-'*60}")
for metric in ['token_iou', 'token_precision', 'token_recall', 'token_f1']:
    baseline_val = df_baseline[metric].mean()
    reranked_val = df_reranked[metric].mean()
    improvement = reranked_val - baseline_val
    print(f"{metric:<20} {baseline_val:>12.4f} {reranked_val:>12.4f} {improvement:>+12.4f}")

print(f"\n{'─'*80}")
print("PASSAGE-LEVEL METRICS")
print(f"{'─'*80}")
print(f"{'Metric':<20} {'Baseline':>12} {'Re-ranked':>12} {'Improvement':>12}")
print(f"{'-'*60}")
for metric in ['passage_coverage', 'passage_accuracy', 'passage_precision', 'passage_recall', 'passage_f1']:
    baseline_val = df_baseline[metric].mean()
    reranked_val = df_reranked[metric].mean()
    improvement = reranked_val - baseline_val
    print(f"{metric:<20} {baseline_val:>12.4f} {reranked_val:>12.4f} {improvement:>+12.4f}")

print(f"\n{'─'*80}")
print("DOCUMENT-LEVEL METRICS")
print(f"{'─'*80}")
print(f"{'Metric':<20} {'Baseline':>12} {'Re-ranked':>12} {'Improvement':>12}")
print(f"{'-'*60}")
for metric in ['doc_coverage', 'doc_precision']:
    baseline_val = df_baseline[metric].mean()
    reranked_val = df_reranked[metric].mean()
    improvement = reranked_val - baseline_val
    print(f"{metric:<20} {baseline_val:>12.4f} {reranked_val:>12.4f} {improvement:>+12.4f}")

print(f"\n{'─'*80}")
print("RANKING QUALITY METRICS")
print(f"{'─'*80}")
print(f"{'Metric':<20} {'Baseline':>12} {'Re-ranked':>12} {'Improvement':>12}")
print(f"{'-'*60}")
baseline_mrr = df_baseline['reciprocal_rank'].mean()
reranked_mrr = df_reranked['reciprocal_rank'].mean()
mrr_improvement = reranked_mrr - baseline_mrr
print(f"{'MRR':<20} {baseline_mrr:>12.4f} {reranked_mrr:>12.4f} {mrr_improvement:>+12.4f}")

print(f"\n{'='*80}")

# Overall summary
print(f"\nOVERALL SUMMARY:")
print(f"  Re-ranking improves:")
print(f"    - Passage Coverage: {(df_reranked['passage_coverage'].mean() - df_baseline['passage_coverage'].mean())*100:+.1f}%")
print(f"    - MRR: {mrr_improvement:+.4f} (higher is better)")
print(f"    - Document Precision: {(df_reranked['doc_precision'].mean() - df_baseline['doc_precision'].mean())*100:+.1f}%")

## Rank Distribution Analysis

Where do relevant documents appear in the ranking?

In [None]:
print(f"\n{'='*80}")
print("RANK DISTRIBUTION: Where do relevant documents appear?")
print(f"{'='*80}")

# Baseline rank distribution
baseline_ranks = df_baseline['first_relevant_rank'].dropna()
reranked_ranks = df_reranked['first_relevant_rank'].dropna()

print(f"\nBASELINE (No re-ranking):")
if len(baseline_ranks) > 0:
    rank_counts_baseline = baseline_ranks.value_counts().sort_index()
    for rank, count in rank_counts_baseline.items():
        pct = (count / len(df_baseline)) * 100
        print(f"  Rank {int(rank)}: {count} queries ({pct:.1f}%)")
    print(f"  No relevant doc found: {len(df_baseline) - len(baseline_ranks)} queries ({(len(df_baseline) - len(baseline_ranks))/len(df_baseline)*100:.1f}%)")
else:
    print("  No relevant documents found")

print(f"\nWITH RE-RANKING:")
if len(reranked_ranks) > 0:
    rank_counts_reranked = reranked_ranks.value_counts().sort_index()
    for rank, count in rank_counts_reranked.items():
        pct = (count / len(df_reranked)) * 100
        print(f"  Rank {int(rank)}: {count} queries ({pct:.1f}%)")
    print(f"  No relevant doc found: {len(df_reranked) - len(reranked_ranks)} queries ({(len(df_reranked) - len(reranked_ranks))/len(df_reranked)*100:.1f}%)")
else:
    print("  No relevant documents found")

print(f"\n{'='*80}")
print("\nKey Insight: Re-ranking should move more relevant documents to Rank 1!")

## Conclusion

Re-ranking provides significant improvements to RAG retrieval quality:

1. **Better Ranking Quality (MRR)**
   - Moves relevant documents closer to the top
   - Higher MRR = relevant docs appear earlier in results
   - Critical for user experience and LLM context

2. **Improved Passage Coverage**
   - More reference passages found in top-K results
   - Better recall of relevant information

3. **Higher Document Precision**
   - More focused results from source documents
   - Less noise from irrelevant documents

### When to Use Re-ranking

**Use re-ranking when:**
- Precision is critical (e.g., question answering)
- You need the best documents in top positions
- You have computational resources for the second stage
- Quality is more important than speed

