# Demo #3: Hybrid Search - Combining Semantic and Keyword Retrieval

## Overview

This demo demonstrates how **Hybrid Search** combines the strengths of both dense vector search (semantic) and sparse keyword-based retrieval (BM25) to improve retrieval precision across diverse query types.

### Key Concepts

1. **Dense Vector Search (Semantic)**: Uses embeddings to find semantically similar documents. Great for conceptual queries.
2. **BM25 (Sparse/Keyword)**: Traditional keyword matching with TF-IDF weighting. Excels at exact term matches.
3. **Reciprocal Rank Fusion (RRF)**: Merges ranked results from multiple retrievers by combining their reciprocal ranks.

### Why Hybrid Search?

- **Semantic search** may miss documents with exact technical terms if they're not semantically close
- **Keyword search** fails on conceptual queries where terms don't match exactly
- **Hybrid approach** gets the best of both worlds

### Citations

- **Hybrid Search Technique**: "What is Retrieval-Augmented Generation (RAG)? | Google Cloud"
- **RRF Algorithm**: "Using Hybrid Search to Deliver Fast and Contextually Relevant Results" (Ragie)
- **Best Practices**: "Advanced RAG Implementation using Hybrid Search and Reranking" (Medium)

### Recommended Hugging Face Resources

- **BAAI/bge-m3**: Native support for dense, sparse, and hybrid search - [Link](https://hf.co/BAAI/bge-m3)
- **Paper**: "Vietnamese Legal Information Retrieval" - RRF implementation example - [Link](https://hf.co/papers/2409.13699)

## Setup and Imports

In [None]:
# Install required packages
!pip install -q llama-index llama-index-llms-azure-openai llama-index-embeddings-azure-openai
!pip install -q llama-index-retrievers-bm25 rank-bm25
!pip install -q python-dotenv

In [None]:
import os
from dotenv import load_dotenv
import warnings
warnings.filterwarnings('ignore')

# LlamaIndex core imports
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    Settings
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

# Azure OpenAI imports
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

# BM25 retriever
from llama_index.retrievers.bm25 import BM25Retriever

# For visualization
import pandas as pd
from typing import List

## Configure Azure OpenAI

In [None]:
# Load environment variables
load_dotenv()

# Initialize Azure OpenAI LLM
azure_llm = AzureOpenAI(
    model="gpt-4",
    deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION", "2024-02-15-preview"),
    temperature=0.1
)

# Initialize Azure OpenAI Embedding Model
azure_embed = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION", "2024-02-15-preview"),
)

# Set global defaults
Settings.llm = azure_llm
Settings.embed_model = azure_embed
Settings.chunk_size = 512
Settings.chunk_overlap = 50

print("✓ Azure OpenAI configured successfully")

## Load and Chunk Technical Documents

We'll use technical documents that contain:
- Specific technical terms and acronyms (BERT, GPT-4, API, Docker)
- Conceptual information about how these technologies work

This mix is perfect for testing hybrid search effectiveness.

In [None]:
# Load documents from tech_docs directory
# Adjust path based on your project structure
data_path = "../RAG_v2/data/tech_docs"

documents = SimpleDirectoryReader(data_path).load_data()

print(f"Loaded {len(documents)} documents:")
for doc in documents:
    filename = doc.metadata.get('file_name', 'Unknown')
    print(f"  - {filename}")

In [None]:
# Parse documents into chunks
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = splitter.get_nodes_from_documents(documents)

print(f"\n✓ Created {len(nodes)} text chunks")
print(f"\nExample chunk:")
print("=" * 80)
print(nodes[0].get_content()[:300] + "...")
print("=" * 80)

## Approach #1: Pure Semantic Search (Baseline)

Build a standard vector index using only dense embeddings.

In [None]:
# Create vector index
vector_index = VectorStoreIndex(nodes, embed_model=azure_embed)

# Create vector-only retriever
vector_retriever = vector_index.as_retriever(similarity_top_k=5)

# Create query engine
vector_engine = RetrieverQueryEngine(
    retriever=vector_retriever,
    llm=azure_llm
)

print("✓ Vector search baseline ready")

## Approach #2: BM25 Keyword Search

Traditional sparse retrieval using keyword matching and TF-IDF scoring.

In [None]:
# Initialize BM25 retriever
bm25_retriever = BM25Retriever.from_defaults(
    nodes=nodes,
    similarity_top_k=5
)

# Create query engine
bm25_engine = RetrieverQueryEngine(
    retriever=bm25_retriever,
    llm=azure_llm
)

print("✓ BM25 keyword search ready")

## Approach #3: Hybrid Search with Reciprocal Rank Fusion

Combine both retrievers using QueryFusionRetriever with RRF merging.

In [None]:
# Create hybrid retriever with Reciprocal Rank Fusion
hybrid_retriever = QueryFusionRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    similarity_top_k=5,
    num_queries=1,  # Use original query only (no query generation)
    mode="reciprocal_rerank",  # RRF merging strategy
    use_async=False
)

# Create query engine
hybrid_engine = RetrieverQueryEngine(
    retriever=hybrid_retriever,
    llm=azure_llm
)

print("✓ Hybrid search with RRF ready")

## Test Query #1: Exact Term Match ("What is BERT?")

This query requires finding the specific acronym "BERT" in the documents.
- **Expected**: BM25 should excel due to exact keyword matching
- **Challenge**: Semantic search might return related but not exact matches

In [None]:
query1 = "What is BERT?"

print(f"Query: {query1}")
print("=" * 80)

# Test all three approaches
print("\n🔵 SEMANTIC SEARCH RESULTS:")
print("-" * 80)
vector_nodes = vector_retriever.retrieve(query1)
for i, node in enumerate(vector_nodes, 1):
    print(f"\nRank {i} (Score: {node.score:.4f}):")
    print(f"Source: {node.node.metadata.get('file_name', 'Unknown')}")
    print(f"Text: {node.node.get_content()[:200]}...")

print("\n\n🟢 BM25 KEYWORD SEARCH RESULTS:")
print("-" * 80)
bm25_nodes = bm25_retriever.retrieve(query1)
for i, node in enumerate(bm25_nodes, 1):
    print(f"\nRank {i} (Score: {node.score:.4f}):")
    print(f"Source: {node.node.metadata.get('file_name', 'Unknown')}")
    print(f"Text: {node.node.get_content()[:200]}...")

print("\n\n🟣 HYBRID SEARCH RESULTS (RRF):")
print("-" * 80)
hybrid_nodes = hybrid_retriever.retrieve(query1)
for i, node in enumerate(hybrid_nodes, 1):
    print(f"\nRank {i} (Score: {node.score:.4f}):")
    print(f"Source: {node.node.metadata.get('file_name', 'Unknown')}")
    print(f"Text: {node.node.get_content()[:200]}...")

### Generate Final Answers for Query #1

In [None]:
print("\n" + "=" * 80)
print("FINAL ANSWERS COMPARISON")
print("=" * 80)

print("\n🔵 SEMANTIC SEARCH ANSWER:")
print("-" * 80)
vector_response = vector_engine.query(query1)
print(vector_response.response)

print("\n\n🟢 BM25 KEYWORD SEARCH ANSWER:")
print("-" * 80)
bm25_response = bm25_engine.query(query1)
print(bm25_response.response)

print("\n\n🟣 HYBRID SEARCH ANSWER:")
print("-" * 80)
hybrid_response = hybrid_engine.query(query1)
print(hybrid_response.response)

## Test Query #2: Conceptual Query ("How do transformer models work?")

This query requires semantic understanding, not just keyword matching.
- **Expected**: Semantic search should excel
- **Challenge**: BM25 needs exact word matches for "transformer", "models", "work"

In [None]:
query2 = "How do transformer models work?"

print(f"Query: {query2}")
print("=" * 80)

# Test all three approaches
print("\n🔵 SEMANTIC SEARCH RESULTS:")
print("-" * 80)
vector_nodes_q2 = vector_retriever.retrieve(query2)
for i, node in enumerate(vector_nodes_q2, 1):
    print(f"\nRank {i} (Score: {node.score:.4f}):")
    print(f"Source: {node.node.metadata.get('file_name', 'Unknown')}")
    print(f"Text: {node.node.get_content()[:200]}...")

print("\n\n🟢 BM25 KEYWORD SEARCH RESULTS:")
print("-" * 80)
bm25_nodes_q2 = bm25_retriever.retrieve(query2)
for i, node in enumerate(bm25_nodes_q2, 1):
    print(f"\nRank {i} (Score: {node.score:.4f}):")
    print(f"Source: {node.node.metadata.get('file_name', 'Unknown')}")
    print(f"Text: {node.node.get_content()[:200]}...")

print("\n\n🟣 HYBRID SEARCH RESULTS (RRF):")
print("-" * 80)
hybrid_nodes_q2 = hybrid_retriever.retrieve(query2)
for i, node in enumerate(hybrid_nodes_q2, 1):
    print(f"\nRank {i} (Score: {node.score:.4f}):")
    print(f"Source: {node.node.metadata.get('file_name', 'Unknown')}")
    print(f"Text: {node.node.get_content()[:200]}...")

### Generate Final Answers for Query #2

In [None]:
print("\n" + "=" * 80)
print("FINAL ANSWERS COMPARISON")
print("=" * 80)

print("\n🔵 SEMANTIC SEARCH ANSWER:")
print("-" * 80)
vector_response_q2 = vector_engine.query(query2)
print(vector_response_q2.response)

print("\n\n🟢 BM25 KEYWORD SEARCH ANSWER:")
print("-" * 80)
bm25_response_q2 = bm25_engine.query(query2)
print(bm25_response_q2.response)

print("\n\n🟣 HYBRID SEARCH ANSWER:")
print("-" * 80)
hybrid_response_q2 = hybrid_engine.query(query2)
print(hybrid_response_q2.response)

## Test Query #3: Mixed Query ("What are Docker containers used for in API development?")

This query combines specific technical terms (Docker, API) with conceptual understanding (usage, purpose).
- **Expected**: Hybrid search should excel by leveraging both approaches

In [None]:
query3 = "What are Docker containers used for in API development?"

print(f"Query: {query3}")
print("=" * 80)

# Test all three approaches
print("\n🔵 SEMANTIC SEARCH RESULTS:")
print("-" * 80)
vector_nodes_q3 = vector_retriever.retrieve(query3)
for i, node in enumerate(vector_nodes_q3, 1):
    print(f"\nRank {i} (Score: {node.score:.4f}):")
    print(f"Source: {node.node.metadata.get('file_name', 'Unknown')}")
    print(f"Text: {node.node.get_content()[:200]}...")

print("\n\n🟢 BM25 KEYWORD SEARCH RESULTS:")
print("-" * 80)
bm25_nodes_q3 = bm25_retriever.retrieve(query3)
for i, node in enumerate(bm25_nodes_q3, 1):
    print(f"\nRank {i} (Score: {node.score:.4f}):")
    print(f"Source: {node.node.metadata.get('file_name', 'Unknown')}")
    print(f"Text: {node.node.get_content()[:200]}...")

print("\n\n🟣 HYBRID SEARCH RESULTS (RRF):")
print("-" * 80)
hybrid_nodes_q3 = hybrid_retriever.retrieve(query3)
for i, node in enumerate(hybrid_nodes_q3, 1):
    print(f"\nRank {i} (Score: {node.score:.4f}):")
    print(f"Source: {node.node.metadata.get('file_name', 'Unknown')}")
    print(f"Text: {node.node.get_content()[:200]}...")

### Generate Final Answers for Query #3

In [None]:
print("\n" + "=" * 80)
print("FINAL ANSWERS COMPARISON")
print("=" * 80)

print("\n🔵 SEMANTIC SEARCH ANSWER:")
print("-" * 80)
vector_response_q3 = vector_engine.query(query3)
print(vector_response_q3.response)

print("\n\n🟢 BM25 KEYWORD SEARCH ANSWER:")
print("-" * 80)
bm25_response_q3 = bm25_engine.query(query3)
print(bm25_response_q3.response)

print("\n\n🟣 HYBRID SEARCH ANSWER:")
print("-" * 80)
hybrid_response_q3 = hybrid_engine.query(query3)
print(hybrid_response_q3.response)

## Understanding Reciprocal Rank Fusion (RRF)

### How RRF Works

RRF merges rankings from multiple retrievers using the formula:

$$\text{RRF Score} = \sum_{r \in \text{retrievers}} \frac{1}{k + \text{rank}_r(d)}$$

Where:
- $k$ is a constant (typically 60)
- $\text{rank}_r(d)$ is the rank of document $d$ in retriever $r$
- Documents not retrieved by a retriever get a rank of infinity (contributing 0 to the sum)

### Why RRF is Effective

1. **Score Normalization**: Works without needing to normalize different scoring schemes (cosine similarity vs. BM25 scores)
2. **Rank-Based**: Focuses on relative ranking rather than absolute scores
3. **Simple & Robust**: No hyperparameters to tune beyond $k$

### Example

If a document is ranked:
- **#1 in semantic search**: contributes $\frac{1}{60+1} = 0.0164$
- **#3 in BM25**: contributes $\frac{1}{60+3} = 0.0159$
- **Total RRF Score**: $0.0164 + 0.0159 = 0.0323$

A document highly ranked in both gets a high combined score!

## Data Flow Visualization

```
Query: "What is BERT?"
         |
         v
    [Split to both retrievers]
         |
    +-----------+-----------+
    |                       |
    v                       v
[Vector Search]      [BM25 Search]
(Semantic)           (Keyword)
    |                       |
    v                       v
Ranked List 1        Ranked List 2
1. Doc A (0.89)      1. Doc B (12.4)
2. Doc C (0.85)      2. Doc A (11.2)
3. Doc B (0.82)      3. Doc D (10.8)
4. Doc D (0.78)      4. Doc C (9.5)
5. Doc E (0.75)      5. Doc F (8.9)
    |                       |
    +----------+------------+
               |
               v
    [Reciprocal Rank Fusion]
    Calculate: 1/(k+rank) for each doc
               |
               v
      Unified Ranked List
      1. Doc A (best in both)
      2. Doc B (high in both)
      3. Doc C (medium)
      4. Doc D (medium)
      5. Doc E (lower)
               |
               v
         [LLM Generation]
               |
               v
         Final Answer
```

## Performance Analysis

Let's create a comparison table to visualize when each approach excels.

In [None]:
# Create comparison DataFrame
comparison_data = {
    'Query Type': ['Exact Term Match', 'Conceptual Query', 'Mixed Query'],
    'Example': [
        'What is BERT?',
        'How do transformer models work?',
        'What are Docker containers used for in API development?'
    ],
    'Best Approach': ['BM25 / Hybrid', 'Semantic / Hybrid', 'Hybrid'],
    'Why': [
        'Exact keyword "BERT" needs to be matched',
        'Requires semantic understanding of concepts',
        'Benefits from both term matching and semantic understanding'
    ]
}

df = pd.DataFrame(comparison_data)
print("\n" + "=" * 100)
print("WHEN TO USE EACH APPROACH")
print("=" * 100)
print(df.to_string(index=False))
print("=" * 100)

## Key Takeaways

### Strengths of Each Approach

#### 🔵 Semantic Search (Dense Vectors)
- **Best for**: Conceptual queries, paraphrasing, synonyms
- **Strength**: Understands meaning beyond exact words
- **Weakness**: May miss exact technical terms or acronyms

#### 🟢 BM25 (Sparse/Keyword)
- **Best for**: Exact term matches, technical terminology, proper nouns
- **Strength**: Reliable for finding specific words/phrases
- **Weakness**: No semantic understanding, fails on paraphrasing

#### 🟣 Hybrid Search (RRF)
- **Best for**: Production systems with diverse query types
- **Strength**: Combines benefits of both, robust across query types
- **Weakness**: Slightly more complex, requires both retrievers

### Best Practices

1. **Start with Hybrid**: Unless you have a very specific use case, hybrid search provides the most robust solution
2. **Adjust top_k**: For hybrid, retrieve more from each (e.g., 10-20) before fusion to get diverse candidates
3. **Monitor Query Patterns**: Analyze your actual queries to see which approach works best
4. **Consider Re-ranking**: Hybrid search can be combined with re-ranking for even better precision (see Demo #5)

### Production Considerations

- **Latency**: Hybrid search is roughly 2x the cost of single retriever (runs both in parallel)
- **Index Size**: Need both vector index and BM25 index
- **Updates**: Both indices need updating when documents change
- **Scalability**: Vector search scales better than BM25 for very large corpora

## Conclusion

Hybrid search with Reciprocal Rank Fusion provides a robust retrieval strategy that:
1. Handles both exact-match and semantic queries effectively
2. Requires no complex score normalization
3. Is simple to implement with QueryFusionRetriever
4. Provides consistently good performance across diverse query types

**Next Steps**: Combine hybrid search with re-ranking (Demo #5) for even better precision!