```{contents}
```
## Hybrid Search


**Hybrid Search** is a retrieval strategy that **combines semantic similarity search (embeddings)** with **lexical/keyword search (BM25 / TF-IDF)** to retrieve the most relevant documents.

> Hybrid Search = **Meaning-based retrieval + Keyword-based retrieval**

It is the **production-standard retrieval approach** for enterprise RAG systems.

---

### Why Hybrid Search Is Needed

Neither approach alone is sufficient:

### Problems with Semantic (Vector) Search

* Misses exact keywords (IDs, error codes)
* Struggles with rare terms
* Weak on numbers, versions, acronyms

### Problems with Keyword Search

* No semantic understanding
* Fails on paraphrases and synonyms
* Brittle to language variation

Hybrid search **balances recall and precision**.

---

### Conceptual Flow

```
User Query
   ├── Keyword Search (BM25)
   ├── Vector Similarity Search
   ↓
Candidate Merge
   ↓
Score Normalization
   ↓
(Optional) Reranking
   ↓
Top-K Results
```

---

### Where Hybrid Search Fits in RAG

```
Query
  ↓
Hybrid Retriever
  ↓
Relevant Chunks
  ↓
Prompt
  ↓
LLM
```

Hybrid search is a **query-time operation**.

---

### Core Components of Hybrid Search

#### Lexical Retriever

* BM25 / TF-IDF
* Exact term matching
* Strong on identifiers and structure

#### Vector Retriever

* Embedding similarity
* Semantic meaning
* Strong on paraphrases

#### Score Fusion

* Combines scores from both retrievers

---

### Hybrid Search Strategies

#### Strategy 1: Union + Rerank (Most Common)

```
BM25 Top-K
UNION
Vector Top-K
→ Reranker
```

* High recall
* Reranker improves precision

---

#### Strategy 2: Weighted Score Fusion

```
final_score = α * vector_score + (1 - α) * keyword_score
```

* Faster
* Requires tuning α

---

#### Strategy 3: Filter + Semantic Search

```
Keyword Filter → Vector Search
```

Used when:

* Hard constraints exist (tenant, product, date)

---

### Hybrid Search Demonstration (LangChain Conceptual)

### Vector Retriever

```python
vector_retriever = vectorstore.as_retriever(
    search_kwargs={"k": 5}
)
```

---

### Keyword Retriever (BM25)

```python
from langchain.retrievers import BM25Retriever

bm25 = BM25Retriever.from_documents(documents)
bm25.k = 5
```

---

### Ensemble Retriever (Hybrid)

```python
from langchain.retrievers import EnsembleRetriever

hybrid_retriever = EnsembleRetriever(
    retrievers=[bm25, vector_retriever],
    weights=[0.4, 0.6]
)
```

---

### Querying Hybrid Search

```python
docs = hybrid_retriever.get_relevant_documents(
    "Jira ticket escalation error 500"
)
```

Results include:

* Exact keyword matches (`error 500`)
* Semantically similar context

---

### Hybrid Search with Reranking (Production Pattern)

```
Hybrid Retriever → Top 20
        ↓
Cross-Encoder Reranker
        ↓
Top 5
        ↓
LLM
```

This yields:

* High recall
* High precision
* Stable answers

---

### Hybrid Search vs Pure Vector Search

| Aspect               | Vector Only | Hybrid |
| -------------------- | ----------- | ------ |
| Semantic recall      | High        | High   |
| Keyword precision    | Low         | High   |
| Rare terms           | Weak        | Strong |
| Production readiness | Medium      | High   |

---

### Hybrid Search vs MMR

| Aspect        | Hybrid Search      | MMR            |
| ------------- | ------------------ | -------------- |
| Purpose       | Recall & precision | Diversity      |
| Stage         | Retrieval          | Post-retrieval |
| Used together | ✅                  | ✅              |

---

### Production-Grade Considerations

### Score Normalization

Vector and BM25 scores are not comparable by default.

Solutions:

* Min–max normalization
* Rank-based fusion
* Reciprocal Rank Fusion (RRF)

---

### Latency Budget

Typical production target:

* Hybrid retrieval: < 100 ms
* Reranking: < 150 ms

Optimization:

* Small k per retriever
* Parallel retrieval

---

### Metadata Filtering (Mandatory)

```python
filter={"tenant_id": "org_123"}
```

Applied **before** or **after** retrieval.

---

### Observability

Log:

* Which retriever contributed which document
* Score breakdown
* Query types

Critical for tuning.

---

### Cost Control

Hybrid search reduces:

* LLM hallucinations
* Long prompts
* Retry cost

It increases:

* Retrieval compute (acceptable tradeoff)

---

### Common Mistakes

#### Using vector-only search in production

❌ Misses exact matches

#### No reranking

❌ Noisy context

#### Large k values

❌ Latency spikes

#### No score normalization

❌ Unstable ranking

---

### When to Use Hybrid Search

* Enterprise RAG
* IT support / tickets
* Legal / compliance docs
* Product documentation
* Any high-accuracy system

---

### When Hybrid Search Is Overkill

* Small datasets
* Prototypes
* Pure semantic discovery tasks

---

### Interview-Ready Summary

> “Hybrid search combines vector-based semantic retrieval with keyword-based lexical search to achieve high recall and high precision. It is the standard retrieval strategy in production RAG systems.”

---

### Rule of Thumb

* **Prototype → Vector search**
* **Production → Hybrid search**
* **Accuracy-critical → Hybrid + reranker**
* **Exact terms matter → Always hybrid**