```{contents}
```
## Re-Ranking

**Re-ranking** is the process of improving retrieval quality by **reordering an initial set of candidate results** using a more accurate (but more expensive) model after a fast retrieval stage.
It is a core component of modern **RAG, search engines, and recommendation systems**.

---

### Core Intuition

Fast retrieval methods (BM25, vector search) optimize for speed, not precision.
Re-ranking trades additional computation for **higher answer quality** on a much smaller candidate set.

> **Retrieve broadly → then rank precisely.**

---

### Where Re-Ranking Fits in the Pipeline

```
User Query
   ↓
Initial Retrieval (BM25 / Vector / Hybrid)
   ↓
Top-K Candidates (e.g., 50–200)
   ↓
Re-Ranker (cross-encoder / LLM / ML model)
   ↓
Top-N Final Results (e.g., 5–20)
```

---

### Types of Re-Rankers

| Type                    | Description                     | Speed  | Accuracy  |
| ----------------------- | ------------------------------- | ------ | --------- |
| Bi-encoder score fusion | Combine keyword + vector scores | Fast   | Medium    |
| Cross-encoder           | Jointly encode query & doc      | Medium | High      |
| LLM-based               | Prompt model to score relevance | Slow   | Very high |
| Learning-to-rank (LTR)  | Trained ranking model           | Medium | High      |

---

### Why Re-Ranking Matters

| Without Re-Ranking | With Re-Ranking             |
| ------------------ | --------------------------- |
| Noisy top results  | Highly relevant top results |
| Lower RAG quality  | Strong grounding            |
| More hallucination | Fewer hallucinations        |

---

### Simple Demonstration (Python)

```python
from sentence_transformers import CrossEncoder

# Candidate docs from initial retrieval
query = "how to fix memory leak in python"
docs = [
    "Python garbage collection explained",
    "Optimizing memory usage in Python applications",
    "Installing Python on Windows"
]

# Cross-encoder re-ranker
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
scores = model.predict([(query, d) for d in docs])

ranked = sorted(zip(docs, scores), key=lambda x: x[1], reverse=True)
for doc, score in ranked:
    print(score, doc)
```

---

### Re-Ranking in RAG

```
User → Retrieve (100 docs) → Re-rank (20 docs) → LLM → Answer
```

This step dramatically improves **context quality** for generation.

---

### Applications

* Enterprise search
* Legal & medical QA
* E-commerce search
* Knowledge assistants
* Recommendation systems

---

### Best Practices

* Keep initial retrieval broad
* Re-rank only top-K to control latency
* Prefer cross-encoders for quality-critical systems
* Cache frequent re-ranking results

---

### Summary

| Property      | Value                 |
| ------------- | --------------------- |
| Purpose       | Improve precision     |
| Position      | After retrieval       |
| Core benefit  | Higher answer quality |
| Impact on RAG | Very high             |

