# Reranking: Improving Search Quality with Cross-Encoders

## Introduction to Reranking

While basic vector similarity works well for initial retrieval, production RAG systems often use reranking to improve result quality. This notebook focuses on reranking techniques that can significantly improve search relevance.

### What is Reranking?

- **Two-Stage Process**: Fast initial retrieval followed by sophisticated reranking
- **Cross-Encoders**: Models that see both query and document together
- **Quality Improvement**: Often provides 15-30% better results
- **Trade-off**: Better quality at the cost of increased latency

## Learning Objectives

By the end of this notebook, you will:
1. Understand how reranking works and why it's effective
2. Implement a two-stage retrieval system
3. Learn about cross-encoder models and their advantages
4. Understand the performance vs quality trade-offs
5. Know when to use reranking in production systems


In [1]:
# Essential imports for advanced search techniques
import json
import numpy as np
import pandas as pd
from pathlib import Path
from typing import List, Dict, Any, Optional, Tuple
import time
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
from sentence_transformers import SentenceTransformer, CrossEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
import re
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

# Add project root to path
import sys
sys.path.append(str(Path.cwd().parent))

# Import our modules
from src.config import DATA_DIR

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")

print("Advanced Vector Search Libraries Loaded!")
print(f"Data directory: {DATA_DIR}")

# Load our processed data
processed_dir = DATA_DIR / "processed"
chunks_file = processed_dir / "all_chunks.json"

if chunks_file.exists():
    with open(chunks_file, 'r', encoding='utf-8') as f:
        all_chunks = json.load(f)
    print(f"Loaded {len(all_chunks)} chunks for advanced search experiments")
else:
    print("No processed chunks found. Please run the data collection notebook first.")
    all_chunks = []


Advanced Vector Search Libraries Loaded!
Data directory: /Users/scienceman/Desktop/LLM/data
Loaded 18 chunks for advanced search experiments


## 1. Reranking: Making Search Results Even Better

### The Reranking Problem

Even with basic vector search, the top results might not be perfect. Reranking uses more sophisticated models to reorder results for better quality.

### Two-Stage Retrieval Process

1. **Stage 1**: Fast retrieval (vector search) → Get 50-100 candidates
2. **Stage 2**: Slow reranking (cross-encoder) → Reorder top 10-20

**Why This Works**: Cross-encoders see both query and document together, making better relevance judgments.


In [2]:
# Load models for reranking
print("Loading models for reranking...")
dense_model = SentenceTransformer('all-MiniLM-L6-v2')  # For initial retrieval
print(f"Dense model loaded: {dense_model.get_sentence_embedding_dimension()}D")

class BasicSearchEngine:
    def __init__(self, dense_model):
        self.dense_model = dense_model
        
    def index_documents(self, documents):
        """Index documents for dense search."""
        self.documents = documents
        
        # Dense embeddings
        print("Generating dense embeddings...")
        self.dense_embeddings = self.dense_model.encode(documents)
        
        print(f"Indexed {len(documents)} documents")
        
    def search(self, query, top_k=5):
        """Perform basic dense search."""
        # Dense search
        query_embedding = self.dense_model.encode([query])
        dense_scores = cosine_similarity(query_embedding, self.dense_embeddings)[0]
        
        # Get top results
        top_indices = np.argsort(dense_scores)[::-1][:top_k]
        
        results = []
        for idx in top_indices:
            results.append({
                'document': self.documents[idx],
                'similarity': dense_scores[idx],
                'index': idx
            })
            
        return results

# Initialize basic search engine
search_engine = BasicSearchEngine(dense_model)

# Index our documents
if all_chunks:
    documents = [chunk['text'] for chunk in all_chunks[:10]]  # Use first 10 chunks
    search_engine.index_documents(documents)
    print(f"Basic search engine ready with {len(documents)} documents!")
else:
    print("No documents available for indexing.")


Loading models for reranking...
Dense model loaded: 384D
Generating dense embeddings...
Indexed 10 documents
Basic search engine ready with 10 documents!


## 2. Understanding Cross-Encoders

### Why Cross-Encoders Work Better

**The Key Insight**: Cross-encoders see both the query and document together, allowing them to make more sophisticated relevance judgments than bi-encoders.

**Bi-Encoder Limitations**:
- Encode query and document separately
- Only compare final embeddings
- Miss subtle interactions between query and document terms
- Limited by embedding space constraints

**Cross-Encoder Advantages**:
- Process query-document pairs together
- Capture complex interactions and dependencies
- Better understanding of context and nuance
- More accurate relevance scoring

### Cross-Encoder Architecture

Cross-encoders use transformer models to process query-document pairs:
1. **Input**: Concatenated query and document text
2. **Processing**: Full transformer attention between all tokens
3. **Output**: Single relevance score for the pair
4. **Training**: Trained on query-document relevance labels


In [3]:
# Load a cross-encoder for reranking
print("Loading cross-encoder for reranking...")
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
print("Cross-encoder loaded!")

class RerankedSearchEngine:
    def __init__(self, basic_engine, cross_encoder):
        self.basic_engine = basic_engine
        self.cross_encoder = cross_encoder
        
    def search_with_reranking(self, query, top_k=5, rerank_top_k=20):
        """Search with reranking for better results."""
        # Stage 1: Get more candidates than needed
        candidates = self.basic_engine.search(query, top_k=rerank_top_k)
        
        if not candidates:
            return []
            
        # Stage 2: Rerank using cross-encoder
        query_doc_pairs = [(query, result['document']) for result in candidates]
        rerank_scores = self.cross_encoder.predict(query_doc_pairs)
        
        # Combine original scores with rerank scores
        for i, result in enumerate(candidates):
            result['rerank_score'] = float(rerank_scores[i])
            # Weighted combination: 30% original, 70% rerank
            result['final_score'] = 0.3 * result['similarity'] + 0.7 * result['rerank_score']
        
        # Sort by final score and return top_k
        candidates.sort(key=lambda x: x['final_score'], reverse=True)
        return candidates[:top_k]

# Create reranked search engine
reranked_engine = RerankedSearchEngine(search_engine, cross_encoder)

# Test basic search vs reranked search
def compare_search_methods(query, top_k=3):
    """Compare different search methods."""
    print(f"\nQuery: '{query}'")
    print("=" * 60)
    
    # Basic search only
    basic_results = search_engine.search(query, top_k=top_k)
    print(f"\nBasic Search Results:")
    for i, result in enumerate(basic_results):
        print(f"{i+1}. Score: {result['similarity']:.3f} | {result['document'][:80]}...")
    
    # Reranked search
    reranked_results = reranked_engine.search_with_reranking(query, top_k=top_k)
    print(f"\nReranked Search Results:")
    for i, result in enumerate(reranked_results):
        print(f"{i+1}. Final: {result['final_score']:.3f} | Rerank: {result['rerank_score']:.3f} | {result['document'][:80]}...")

# Test with sample queries
if all_chunks:
    test_queries = [
        "What is machine learning?",
        "How do neural networks work?",
        "Tell me about artificial intelligence"
    ]
    
    for query in test_queries:
        compare_search_methods(query)
else:
    print("No documents available for testing.")


Loading cross-encoder for reranking...
Cross-encoder loaded!

Query: 'What is machine learning?'

Basic Search Results:
1. Score: 0.748 | Machine learning (ML) is a field of study in artificial intelligence concerned w...
2. Score: 0.583 | Artificial intelligence (AI) is the capability of computational systems to perfo...
3. Score: 0.474 | In machine learning, deep learning focuses on utilizing multilayered neural netw...

Reranked Search Results:
1. Final: 8.153 | Rerank: 11.327 | Machine learning (ML) is a field of study in artificial intelligence concerned w...
2. Final: 3.444 | Rerank: 4.717 | In machine learning, deep learning focuses on utilizing multilayered neural netw...
3. Final: -0.004 | Rerank: -0.255 | Artificial intelligence (AI) is the capability of computational systems to perfo...

Query: 'How do neural networks work?'

Basic Search Results:
1. Score: 0.512 | In machine learning, deep learning focuses on utilizing multilayered neural netw...
2. Score: 0.482 | Machine l

## Summary: Reranking for Better Search Quality

### What We've Learned

1. **Two-Stage Retrieval**: Fast initial retrieval followed by sophisticated reranking
2. **Cross-Encoders**: Models that see both query and document together for better relevance
3. **Quality Improvement**: Reranking typically provides 15-30% better results
4. **Architecture**: Cross-encoders process query-document pairs with full attention

### Production Considerations

**Performance vs Quality Trade-offs**:
- **Quality**: +15-30% improvement in search relevance
- **Latency**: +200-300% increase in response time
- **Best for**: Quality-critical applications where accuracy matters more than speed

**When to Use Reranking**:
- **High-quality requirements**: When search quality is more important than speed
- **Small result sets**: When you only need top 5-10 results
- **Complex queries**: When simple similarity isn't enough
- **Production systems**: Where user satisfaction depends on result quality

**Implementation Tips**:
- Use 20-50 candidates for reranking (not all documents)
- Balance rerank_top_k based on your latency requirements
- Consider caching rerank results for popular queries
- Monitor both quality and latency metrics


In [4]:
# Test reranking effectiveness
def test_reranking_effectiveness():
    """Test how reranking improves search results."""
    print("\nReranking Effectiveness Test")
    print("=" * 50)
    
    # Test with a sample query
    query = "What is machine learning?"
    
    # Basic search
    basic_results = search_engine.search(query, top_k=5)
    print(f"\nBasic Search Results:")
    for i, result in enumerate(basic_results):
        print(f"{i+1}. Score: {result['similarity']:.3f} | {result['document'][:60]}...")
    
    # Reranked search
    reranked_results = reranked_engine.search_with_reranking(query, top_k=5)
    print(f"\nReranked Search Results:")
    for i, result in enumerate(reranked_results):
        print(f"{i+1}. Final: {result['final_score']:.3f} | Rerank: {result['rerank_score']:.3f} | {result['document'][:60]}...")
    
    # Show improvement
    if basic_results and reranked_results:
        print(f"\nImprovement Analysis:")
        print(f"- Basic search top result: {basic_results[0]['similarity']:.3f}")
        print(f"- Reranked top result: {reranked_results[0]['final_score']:.3f}")
        print(f"- Quality improvement: {((reranked_results[0]['final_score'] - basic_results[0]['similarity']) / basic_results[0]['similarity'] * 100):.1f}%")

# Run the test
if all_chunks:
    test_reranking_effectiveness()
else:
    print("No documents available for testing.")



Reranking Effectiveness Test

Basic Search Results:
1. Score: 0.748 | Machine learning (ML) is a field of study in artificial inte...
2. Score: 0.583 | Artificial intelligence (AI) is the capability of computatio...
3. Score: 0.474 | In machine learning, deep learning focuses on utilizing mult...
4. Score: 0.404 | Natural language processing (NLP) is the processing of natur...
5. Score: 0.364 | Computer vision tasks include methods for acquiring, process...

Reranked Search Results:
1. Final: 8.153 | Rerank: 11.327 | Machine learning (ML) is a field of study in artificial inte...
2. Final: 3.444 | Rerank: 4.717 | In machine learning, deep learning focuses on utilizing mult...
3. Final: -0.004 | Rerank: -0.255 | Artificial intelligence (AI) is the capability of computatio...
4. Final: -4.350 | Rerank: -6.387 | Natural language processing (NLP) is the processing of natur...
5. Final: -5.643 | Rerank: -8.139 | The human instruction is decomposed into a directed acyclic ...

Improvement A

### Next Steps

In the next notebook, we'll explore hybrid search techniques that combine dense and sparse retrieval methods, along with query expansion strategies, to build comprehensive retrieval systems for production RAG applications.
