```{contents}
```
## Maximal Marginal Relevance (MMR)


**Maximal Marginal Relevance (MMR)** is a **retrieval re-ranking strategy** that balances:

* **Relevance** to the query
* **Diversity** among retrieved documents

> MMR reduces redundancy by avoiding multiple chunks that say the same thing.

It is commonly used **after similarity search** and **before sending context to the LLM**.

---

### Why MMR Is Needed

Pure similarity search often returns:

* Highly similar chunks
* Near-duplicate passages
* Redundant context

This causes:

* Wasted context window
* Lower answer quality
* Higher hallucination risk

MMR ensures the retrieved set covers **different aspects** of the query.

---

### Conceptual Goal

```
Maximize relevance
AND
Minimize redundancy
```

MMR explicitly trades off between the two.

---

### The MMR Formula (Intuition)

For each candidate document ( d ):

```
MMR(d) = λ · sim(d, query)
         − (1 − λ) · max sim(d, selected_docs)
```

Where:

* `sim(d, query)` → relevance
* `sim(d, selected_docs)` → redundancy
* `λ` (lambda) ∈ [0, 1] controls the tradeoff

---

### Lambda (λ) Explained

| λ value | Behavior                                   |
| ------- | ------------------------------------------ |
| 1.0     | Pure relevance (same as similarity search) |
| 0.5     | Balanced relevance + diversity             |
| 0.0     | Maximum diversity (rarely useful)          |

**Typical production value:** `λ = 0.5–0.7`

---

### Where MMR Fits in RAG

```
Query
  ↓
Vector / Hybrid Retrieval (Top-N)
  ↓
MMR Selection (Diverse Top-K)
  ↓
LLM
```

MMR is a **query-time** operation.

---

### How MMR Works Step-by-Step

1. Select the most relevant document
2. For remaining candidates:

   * Penalize similarity to already selected docs
3. Iteratively select documents that add **new information**
4. Stop when K documents are selected

---

### MMR vs Similarity Search

| Aspect             | Similarity Search | MMR                   |
| ------------------ | ----------------- | --------------------- |
| Goal               | Relevance only    | Relevance + diversity |
| Redundancy         | High              | Low                   |
| Context efficiency | Lower             | Higher                |
| LLM answer quality | Medium            | Higher                |

---

### MMR Demonstration (LangChain)

#### Using MMR in a Retriever




```python
from langchain_community.vectorstores import FAISS

# Use the existing vectorstore that was already created in a previous cell
# If you need to recreate it, uncomment the line below:
vectorstore = FAISS.from_texts(texts, embeddings)

retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 5,          # final results
        "fetch_k": 20,   # initial candidate pool
        "lambda_mult": 0.6
    }
)

```

* `fetch_k`: how many to retrieve initially
* `k`: how many to return after MMR
* `lambda_mult`: λ value

---

### Querying with MMR

```python
docs = retriever.get_relevant_documents(
    "How does Jira ticket escalation work?"
)
```

Returned documents:

* Relevant
* Non-redundant
* Cover multiple facets

---

### MMR vs Reranking

| Aspect    | MMR       | Reranking   |
| --------- | --------- | ----------- |
| Type      | Heuristic | Model-based |
| Cost      | Low       | High        |
| Latency   | Very low  | Higher      |
| Precision | Medium    | High        |
| Diversity | High      | Medium      |

**Production pattern:**
`Hybrid Search → MMR → Reranker → LLM`

---

### When MMR Helps Most

* Highly repetitive documents
* Long manuals or policies
* Logs and ticket histories
* RAG systems with chunk overlap
* When context window is tight

---

### When MMR Is Less Useful

* Very small datasets
* Highly precise fact lookup
* When reranking is already applied aggressively

---

### MMR and Chunk Overlap

Chunk overlap increases redundancy.
MMR counteracts this by:

* Selecting only one chunk from overlapping regions
* Preserving coverage while reducing noise

---

### Production Tuning Guidelines

### Recommended Defaults

```text
fetch_k = 20–50
k = 3–7
lambda = 0.5–0.7
```

### Tradeoffs

* Higher `fetch_k` → better diversity, higher cost
* Lower `lambda` → more diversity, less relevance

---

### Common Mistakes

#### Using MMR alone

❌ Does not replace reranking

#### Very low lambda

❌ Results become off-topic

#### Small fetch_k

❌ No diversity benefit

#### Applying MMR at ingestion time

❌ Must be query-time only

---

### Best Practices

* Use MMR after initial retrieval
* Combine with hybrid search
* Tune lambda per domain
* Log selected chunks for observability
* Pair with reranking for accuracy

---

### Interview-Ready Summary

> “Maximal Marginal Relevance (MMR) is a retrieval strategy that balances relevance to the query with diversity among selected documents. It reduces redundancy in RAG systems and improves context efficiency.”

---

### Rule of Thumb

* **Similarity search → relevance**
* **MMR → diversity**
* **Reranker → precision**
* **Best RAG = all three**

