# Bi-Encoder vs Cross-Encoder for Semantic Search

**Learning Objectives:**
- Understand the difference between bi-encoder and cross-encoder architectures
- Implement cosine similarity for vector-based search
- Compare performance and use cases for both approaches
- Learn when to use each approach in RAG pipelines

## Setup

In [None]:
!pip install -q sentence_transformers numpy

In [None]:
from sentence_transformers import CrossEncoder, SentenceTransformer
import numpy as np

## 1. Understanding the Architectures

### Bi-Encoder
- Encodes query and documents **independently**
- Produces fixed-size embeddings (vectors)
- Fast: Can pre-compute document embeddings
- Scalable: Use vector databases for efficient search

### Cross-Encoder
- Encodes query and document **together** as a pair
- Produces a relevance score directly
- Slow: Must process every (query, document) pair
- More accurate: Captures interaction between query and document

**Best Practice:** Use bi-encoder for first-stage retrieval, cross-encoder for reranking.

## 2. Implement Cosine Similarity

Cosine similarity measures the angle between two vectors. It's the most common metric for comparing semantic embeddings.

In [None]:
def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    """
    Calculate cosine similarity between two vectors.

    Formula: cos(θ) = (a · b) / (||a|| × ||b||)

    Args:
        a: First vector
        b: Second vector

    Returns:
        Similarity score between -1 and 1 (higher is more similar)
    """
    pass


# Test the function
vec1 = np.array([1, 2, 3])
vec2 = np.array([2, 4, 6])  # Same direction as vec1
vec3 = np.array([1, 0, -1])  # Different direction

print(f"Similarity (parallel vectors): {cosine_similarity(vec1, vec2):.4f}")
print(f"Similarity (different vectors): {cosine_similarity(vec1, vec3):.4f}")

## 3. Prepare Test Data

In [None]:
query = "What is the French most visited monument?"

answers = [
    "The Eiffel Tower is the most visited monument in France, and is located in Paris",
    "France is an amazing country, where you should live in",
    "Who knows the French most visited monument?",
]

print(f"Query: {query}\n")
for i, ans in enumerate(answers, 1):
    print(f"Answer {i}: {ans}")

## 4. Bi-Encoder Approach

Using **all-MiniLM-L6-v2**: A fast, lightweight bi-encoder model.

In [None]:
bi_encoder = SentenceTransformer("all-MiniLM-L6-v2")

# Encode query and answers independently
query_embedding = bi_encoder.encode(query)
answer_embeddings = bi_encoder.encode(answers)

print(f"Query embedding shape: {query_embedding.shape}")
print(f"Answer embeddings shape: {answer_embeddings.shape}")

In [None]:
# Calculate similarity scores
bi_encoder_scores = [
    (cosine_similarity(query_embedding, emb), ans)
    for emb, ans in zip(answer_embeddings, answers)
]

# Sort by score (descending)
bi_encoder_scores.sort(reverse=True, key=lambda x: x[0])

print("\n" + "=" * 80)
print("BI-ENCODER RESULTS")
print("=" * 80)
for score, answer in bi_encoder_scores:
    print(f"\nScore: {score:.4f}")
    print(f"Answer: {answer}")

## 5. Cross-Encoder Approach

Using **ms-marco-MiniLM-L-12-v2**: A cross-encoder trained specifically for ranking.

In [None]:
cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-12-v2")

# Create (query, answer) pairs and score them
pairs = [(query, answer) for answer in answers]
cross_encoder_scores = cross_encoder.predict(pairs)

# Combine scores with answers and sort
cross_results = list(zip(cross_encoder_scores, answers))
cross_results.sort(reverse=True, key=lambda x: x[0])

print("\n" + "=" * 80)
print("CROSS-ENCODER RESULTS")
print("=" * 80)
for score, answer in cross_results:
    print(f"\nScore: {score:.4f}")
    print(f"Answer: {answer}")

## 6. Side-by-Side Comparison

In [None]:
import pandas as pd

comparison = pd.DataFrame(
    {
        "Answer": answers,
        "Bi-Encoder Score": [score for score, _ in bi_encoder_scores],
        "Cross-Encoder Score": [score for score, _ in cross_results],
    }
)

print("\n" + "=" * 80)
print("COMPARISON")
print("=" * 80)
print(comparison.to_string(index=False))

## 7. Key Takeaways

| Aspect | Bi-Encoder | Cross-Encoder |
|--------|-----------|---------------|
| **Speed** | ✓ Fast | ✗ Slow |
| **Scalability** | ✓ Excellent (vector DB) | ✗ Limited |
| **Accuracy** | ✗ Lower | ✓ Higher |
| **Pre-computation** | ✓ Yes | ✗ No |
| **Use Case** | First-stage retrieval | Reranking top-K results |

### Recommended Pipeline

1. **Bi-Encoder**: Retrieve top 50-100 candidates from large corpus
2. **Cross-Encoder**: Rerank top candidates to get best 5-10
3. **LLM**: Generate final answer using reranked documents

This hybrid approach balances speed and accuracy!