# Demo #3: Hybrid Search - Combining Semantic and Keyword Retrieval

## Overview

This demo demonstrates how **Hybrid Search** combines the strengths of semantic (dense vector) search and keyword (sparse/BM25) search to achieve superior retrieval performance across diverse query types.

### Core Concepts:
- **Semantic Search (Dense Vectors)**: Understanding meaning and context
- **Keyword Search (BM25)**: Exact term matching and statistical relevance
- **Reciprocal Rank Fusion (RRF)**: Merging results from multiple retrievers
- **Hybrid Retrieval**: Combining complementary search paradigms

### Why Hybrid Search Works:

**Semantic search alone** struggles with:
- Exact technical terms, acronyms, product names
- Rare or domain-specific terminology
- Queries requiring precise matches

**Keyword search alone** struggles with:
- Semantic similarity (synonyms, paraphrases)
- Conceptual queries without exact terms
- Understanding context and meaning

**Hybrid search** combines both:
- Semantic search finds conceptually relevant documents
- Keyword search ensures exact matches aren't missed
- RRF intelligently merges both result sets

### Demo Structure:
1. Setup with technical documents
2. Test pure semantic search
3. Test pure keyword (BM25) search
4. Implement hybrid search with RRF
5. Comparative evaluation across query types

## 1. Environment Setup and Dependencies

In [None]:
# Install required packages
# Run this cell only once
# !pip install llama-index llama-index-llms-azure-openai llama-index-embeddings-azure-openai llama-index-retrievers-bm25 python-dotenv rank-bm25

In [None]:
# Import required libraries
import os
from dotenv import load_dotenv
from pathlib import Path
from typing import List

# LlamaIndex core components
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    Settings
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.retrievers import VectorIndexRetriever, QueryFusionRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.schema import NodeWithScore

# BM25 retriever
from llama_index.retrievers.bm25 import BM25Retriever

# Azure OpenAI components
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

print("✓ All imports successful")

## 2. Configure Azure OpenAI Connection

In [None]:
# Load environment variables
load_dotenv()

# Azure OpenAI configuration
api_key = os.getenv("AZURE_OPENAI_API_KEY")
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2024-02-15-preview")
llm_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")
embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT")

# Validate configuration
if not all([api_key, azure_endpoint, llm_deployment, embedding_deployment]):
    raise ValueError("Missing required Azure OpenAI configuration. Check your .env file.")

print("✓ Azure OpenAI configuration loaded")

In [None]:
# Initialize Azure OpenAI models
azure_llm = AzureOpenAI(
    model="gpt-4",
    deployment_name=llm_deployment,
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
    temperature=0.1,
)

azure_embed = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name=embedding_deployment,
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

# Set global defaults
Settings.llm = azure_llm
Settings.embed_model = azure_embed
Settings.chunk_size = 512
Settings.chunk_overlap = 50

print("✓ Azure OpenAI models initialized")

## 3. Load Technical Documents

We'll use documents with specific technical terms and acronyms (BERT, GPT-4, API, Docker) to demonstrate the difference between semantic and keyword search.

In [None]:
# Define data directory
data_dir = Path("./data/tech_docs")

# Load documents
documents = SimpleDirectoryReader(
    input_dir=str(data_dir),
    required_exts=['.md']
).load_data()

print(f"✓ Loaded {len(documents)} documents")
for doc in documents:
    filename = Path(doc.metadata.get('file_name', 'unknown')).stem
    print(f"  - {filename} ({len(doc.text)} characters)")

In [None]:
# Create text splitter and parse documents
text_splitter = SentenceSplitter(
    chunk_size=512,
    chunk_overlap=50
)

nodes = text_splitter.get_nodes_from_documents(documents)

print(f"✓ Created {len(nodes)} text chunks")
print(f"\nSample chunk:")
print(f"Text: {nodes[0].text[:200]}...")
print(f"Source: {Path(nodes[0].metadata.get('file_name', 'unknown')).stem}")

## 4. Pure Semantic Search (Dense Vectors)

First, let's build a standard vector search baseline using Azure OpenAI embeddings.

In [None]:
# Create vector index
vector_index = VectorStoreIndex(
    nodes=nodes,
    embed_model=azure_embed
)

print("✓ Vector index created")

In [None]:
# Create vector retriever
vector_retriever = VectorIndexRetriever(
    index=vector_index,
    similarity_top_k=5
)

# Create query engine from vector retriever
vector_engine = RetrieverQueryEngine(
    retriever=vector_retriever,
    llm=azure_llm
)

print("✓ Semantic search engine created")

## 5. Test Semantic Search with Different Query Types

In [None]:
# Define test queries of different types
test_queries = [
    # Exact term queries (should favor keyword search)
    "What is BERT?",
    "Tell me about GPT-4",
    "How do I use REST API?",
    
    # Conceptual queries (should favor semantic search)
    "How do attention mechanisms work in deep learning?",
    "What is a lightweight way to package applications?",
    "How are words represented as vectors?"
]

# Test with an exact term query
exact_query = test_queries[0]
print(f"Test Query (Exact Term): {exact_query}")
print("=" * 100)

In [None]:
# Retrieve using semantic search
retrieved_nodes = vector_retriever.retrieve(exact_query)

print("\n📊 SEMANTIC SEARCH RESULTS (Dense Vectors)")
print("=" * 100)
for i, node in enumerate(retrieved_nodes, 1):
    print(f"\n[Rank {i}] Score: {node.score:.4f} | Source: {Path(node.metadata.get('file_name', 'unknown')).stem}")
    print(f"{node.text[:250]}...")
    print("-" * 100)

## 6. Implement BM25 Keyword Search

Now let's create a BM25 retriever for keyword-based search.

In [None]:
# Create BM25 retriever
bm25_retriever = BM25Retriever.from_defaults(
    nodes=nodes,
    similarity_top_k=5
)

print("✓ BM25 retriever created")

In [None]:
# Test BM25 with the same query
bm25_nodes = bm25_retriever.retrieve(exact_query)

print(f"\nQuery: {exact_query}")
print("\n🔍 BM25 KEYWORD SEARCH RESULTS (Sparse)")
print("=" * 100)
for i, node in enumerate(bm25_nodes, 1):
    print(f"\n[Rank {i}] Score: {node.score:.4f} | Source: {Path(node.metadata.get('file_name', 'unknown')).stem}")
    print(f"{node.text[:250]}...")
    print("-" * 100)

## 7. Create Hybrid Retriever with Reciprocal Rank Fusion

Now let's combine both retrievers using QueryFusionRetriever with RRF.

In [None]:
# Create hybrid retriever using Query Fusion
hybrid_retriever = QueryFusionRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    similarity_top_k=5,
    num_queries=1,  # Use original query only (no query generation)
    mode="reciprocal_rerank",  # Use Reciprocal Rank Fusion
    use_async=False
)

print("✓ Hybrid retriever created with Reciprocal Rank Fusion")

In [None]:
# Create query engine from hybrid retriever
hybrid_engine = RetrieverQueryEngine(
    retriever=hybrid_retriever,
    llm=azure_llm
)

print("✓ Hybrid search query engine created")

## 8. Test Hybrid Search

In [None]:
# Retrieve using hybrid search
hybrid_nodes = hybrid_retriever.retrieve(exact_query)

print(f"\nQuery: {exact_query}")
print("\n🚀 HYBRID SEARCH RESULTS (Dense + Sparse + RRF)")
print("=" * 100)
for i, node in enumerate(hybrid_nodes, 1):
    print(f"\n[Rank {i}] Score: {node.score:.4f} | Source: {Path(node.metadata.get('file_name', 'unknown')).stem}")
    print(f"{node.text[:250]}...")
    print("-" * 100)

## 9. Comprehensive Comparison Across Query Types

Let's systematically compare all three approaches across different query types.

In [None]:
def compare_retrieval_approaches(query: str, vector_ret, bm25_ret, hybrid_ret):
    """Compare three retrieval approaches for a given query."""
    
    print(f"\n{'=' * 100}")
    print(f"QUERY: {query}")
    print("=" * 100)
    
    # Retrieve from all three
    vector_nodes = vector_ret.retrieve(query)
    bm25_nodes = bm25_ret.retrieve(query)
    hybrid_nodes = hybrid_ret.retrieve(query)
    
    # Extract sources
    vector_sources = [Path(n.metadata.get('file_name', 'unknown')).stem for n in vector_nodes]
    bm25_sources = [Path(n.metadata.get('file_name', 'unknown')).stem for n in bm25_nodes]
    hybrid_sources = [Path(n.metadata.get('file_name', 'unknown')).stem for n in hybrid_nodes]
    
    # Display top result from each
    print("\n📊 Semantic (Vector) - Top Result:")
    print(f"   Source: {vector_sources[0]}")
    print(f"   Score: {vector_nodes[0].score:.4f}")
    print(f"   Preview: {vector_nodes[0].text[:150]}...")
    
    print("\n🔍 Keyword (BM25) - Top Result:")
    print(f"   Source: {bm25_sources[0]}")
    print(f"   Score: {bm25_nodes[0].score:.4f}")
    print(f"   Preview: {bm25_nodes[0].text[:150]}...")
    
    print("\n🚀 Hybrid (RRF) - Top Result:")
    print(f"   Source: {hybrid_sources[0]}")
    print(f"   Score: {hybrid_nodes[0].score:.4f}")
    print(f"   Preview: {hybrid_nodes[0].text[:150]}...")
    
    # Show ranking comparison
    print("\n📋 Top-5 Source Rankings:")
    print(f"   Semantic: {vector_sources}")
    print(f"   Keyword:  {bm25_sources}")
    print(f"   Hybrid:   {hybrid_sources}")
    
    print("-" * 100)

print("✓ Comparison function defined")

In [None]:
# Test with exact term queries (favor keyword search)
print("\n" + "#" * 100)
print("# EXACT TERM QUERIES (Should favor keyword/BM25 search)")
print("#" * 100)

exact_queries = [
    "What is BERT?",
    "Tell me about Docker containers",
    "How do REST APIs work?"
]

for query in exact_queries:
    compare_retrieval_approaches(query, vector_retriever, bm25_retriever, hybrid_retriever)

In [None]:
# Test with conceptual queries (favor semantic search)
print("\n" + "#" * 100)
print("# CONCEPTUAL QUERIES (Should favor semantic/vector search)")
print("#" * 100)

conceptual_queries = [
    "How do attention mechanisms work in deep learning models?",
    "What's a lightweight way to package and deploy applications?",
    "How are words represented as numerical vectors?"
]

for query in conceptual_queries:
    compare_retrieval_approaches(query, vector_retriever, bm25_retriever, hybrid_retriever)

## 10. Understanding Reciprocal Rank Fusion (RRF)

Let's understand how RRF combines rankings from multiple retrievers.

In [None]:
print("""
╔════════════════════════════════════════════════════════════════════════════════╗
║                    RECIPROCAL RANK FUSION (RRF) EXPLAINED                      ║
╚════════════════════════════════════════════════════════════════════════════════╝

RRF is a simple yet effective method to merge rankings from multiple retrievers.

FORMULA:
────────
RRF_score(doc) = Σ(1 / (k + rank_i(doc)))

Where:
- k = constant (typically 60) to prevent division by zero
- rank_i(doc) = rank of document in retriever i
- Σ = sum across all retrievers

EXAMPLE:
────────
Document A appears at:
  - Rank 1 in Semantic Search
  - Rank 5 in BM25 Search

RRF_score(A) = 1/(60+1) + 1/(60+5)
             = 1/61 + 1/65
             = 0.0164 + 0.0154
             = 0.0318

Document B appears at:
  - Rank 3 in Semantic Search
  - Rank 2 in BM25 Search

RRF_score(B) = 1/(60+3) + 1/(60+2)
             = 1/63 + 1/62
             = 0.0159 + 0.0161
             = 0.0320

Result: Document B ranks higher (0.0320 > 0.0318)

KEY ADVANTAGES:
───────────────
✓ No score normalization needed (different retrievers have different score ranges)
✓ Simple and interpretable
✓ Gives higher weight to top-ranked documents
✓ Balances multiple retrieval signals
✓ Robust to outliers

WHY IT WORKS:
─────────────
- Documents appearing in top ranks of MULTIPLE retrievers get boosted
- Single retriever mistakes are mitigated
- Leverages strengths of different retrieval paradigms
- Documents must be relevant by MULTIPLE criteria
""")

## 11. Generate Answers and Compare Quality

In [None]:
# Test answer generation with all three approaches
test_query = "What is BERT and how does it differ from previous language models?"

print(f"\nQuery: {test_query}")
print("=" * 100)

# Semantic search answer
vector_response = vector_engine.query(test_query)
print("\n📊 SEMANTIC SEARCH ANSWER:")
print(vector_response.response)
print(f"\nSources: {[Path(n.metadata.get('file_name', 'unknown')).stem for n in vector_response.source_nodes]}")

# Hybrid search answer
print("\n" + "=" * 100)
hybrid_response = hybrid_engine.query(test_query)
print("\n🚀 HYBRID SEARCH ANSWER:")
print(hybrid_response.response)
print(f"\nSources: {[Path(n.metadata.get('file_name', 'unknown')).stem for n in hybrid_response.source_nodes]}")

print("\n" + "=" * 100)

## 12. Visualize Data Flow

In [None]:
print("""
╔════════════════════════════════════════════════════════════════════════════════╗
║                         HYBRID SEARCH DATA FLOW                                ║
╚════════════════════════════════════════════════════════════════════════════════╝

SEMANTIC SEARCH (Dense Vectors):
─────────────────────────────────

Query: "What is BERT?"
   │
   ▼
Embed Query with Azure OpenAI
   │
   ▼
Vector: [0.023, -0.145, 0.891, ...] (1536 dimensions)
   │
   ▼
Cosine Similarity with Document Embeddings
   │
   ▼
Ranked Results (based on semantic similarity)


KEYWORD SEARCH (BM25):
──────────────────────

Query: "What is BERT?"
   │
   ▼
Tokenize: ["what", "is", "BERT"]
   │
   ▼
BM25 Scoring:
  - Term frequency in document
  - Inverse document frequency
  - Document length normalization
   │
   ▼
Ranked Results (based on term statistics)


HYBRID SEARCH WITH RRF:
────────────────────────

Query: "What is BERT?"
         │
         ├──────────────────────┐
         │                      │
         ▼                      ▼
  Semantic Search         BM25 Search
         │                      │
         ▼                      ▼
  [Doc A, Doc C, Doc B]   [Doc A, Doc B, Doc D]
  (Ranks: 1, 2, 3)        (Ranks: 1, 2, 3)
         │                      │
         └──────────┬───────────┘
                    ▼
         Reciprocal Rank Fusion
                    │
                    ▼
         Doc A: 1/61 + 1/61 = 0.0328  ← Top (in both at rank 1)
         Doc B: 1/63 + 1/62 = 0.0320
         Doc C: 1/62 + 0    = 0.0161
         Doc D: 0    + 1/63 = 0.0159
                    │
                    ▼
         Final Ranking: [Doc A, Doc B, Doc C, Doc D]
                    │
                    ▼
              LLM Generation
                    │
                    ▼
              Final Answer


KEY INSIGHT:
────────────
Documents appearing high in BOTH rankings get the highest combined scores.
This ensures retrieved documents are relevant by MULTIPLE criteria!
""")

## 13. Key Takeaways and Best Practices

### What We Learned:

1. **Complementary strengths**: Semantic search excels at conceptual queries, while BM25 excels at exact matches.

2. **RRF is simple and effective**: No score normalization needed, just rank-based fusion.

3. **Hybrid search is robust**: Works well across diverse query types without query classification.

4. **Best of both worlds**: Hybrid search retrieves documents that are relevant by MULTIPLE criteria.

### When to Use Hybrid Search:

✅ **Use when:**
- Query types are diverse (mix of exact terms and conceptual)
- Domain has important acronyms, product names, or technical terms
- Can't predict query patterns in advance
- Retrieval precision is critical
- Want robust performance without query classification

❌ **May not need when:**
- All queries are purely semantic/conceptual
- Documents don't contain specific terms that require exact matching
- Computational budget is extremely limited
- Simple vector search is already performing well

### Best Practices:

1. **Test both retrievers separately first**: Understand their individual strengths and weaknesses.

2. **Tune top-k appropriately**: Start with 5-10 results from each retriever.

3. **Monitor retrieval quality**: Track which retriever contributes more to final rankings.

4. **Consider weighted fusion**: Some implementations allow weighting retrievers differently.

5. **Preprocessing matters**: Ensure consistent text normalization across both retrievers.

6. **BM25 hyperparameters**: Tune k1 (term saturation) and b (length normalization) for your domain.

### Trade-offs:

**Advantages:**
- Better precision across diverse queries
- More robust to query variations
- Leverages multiple relevance signals

**Costs:**
- Higher computational cost (two retrievers)
- Slightly increased latency
- More complex to debug and optimize

### Next Steps:

In Demo #4, we'll explore **Hierarchical Retrieval** using the Parent Document Retriever pattern to solve the chunking trade-off problem.