# Retrieval Methods Tutorial

This notebook demonstrates various retrieval methods for RAG systems using vector databases. Each method serves different use cases and has specific advantages.

## Overview of Methods

1. **Basic Similarity Search** - Standard semantic search
2. **MMR (Maximal Marginal Relevance)** - Balance relevance and diversity
3. **Top-k Thresholding** - Quantity-based filtering
4. **Adaptive/Dynamic Retrieval** - Flexible result counts
5. **Metadata Filtering** - Context-aware search
6. **Document Chunk Linking** - Multi-document retrieval
7. **LLM-Guided Filtering** - Intelligent pre-filtering
8. **Reranking** - Reorder results for higher precision  
9. **Hybrid Retrieval** - Combine keyword and semantic search  



## Setup


In [None]:
from collections import Counter
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, Filter, FieldCondition, MatchValue, MatchText
from retrieval_playground.utils import config, constants
from retrieval_playground.utils.model_manager import model_manager
from retrieval_playground.src.pre_retrieval.chunking_strategies import ChunkingStrategy
from langchain.prompts import PromptTemplate
from typing import Dict, List, Any
import json
import gc

import logging
logging.getLogger().setLevel(logging.WARNING)

In [None]:
# Load test queries
def load_test_queries() -> List[Dict[str, Any]]:
    """Load test queries from JSON file."""
    queries_path = config.TESTS_DIR / "test_queries.json"
    with open(queries_path, 'r') as f:
        return json.load(f)

test_queries = load_test_queries()
sample_query = test_queries[0]["user_input"]
print(f"Sample query: {sample_query}")


In [None]:
# Setup vector database
strategy = ChunkingStrategy.UNSTRUCTURED
qdrant_client = QdrantClient(url=constants.QDRANT_URL, api_key=constants.QDRANT_KEY)
embeddings = model_manager.get_embeddings()

vector_store = QdrantVectorStore(
    client=qdrant_client,
    collection_name=strategy.value,
    embedding=embeddings
)

print("Vector store initialized!")


In [None]:
# Common parameters
TOP_K = 3
SCORE_THRESHOLD = 0.5


## 1. Basic Similarity Search

**What it is:**  Standard cosine similarity between query and document embeddings.

**When to use:**
- Default choice for most RAG applications
- When you need the most semantically similar content
- Simple, fast, and reliable

**✅ Pros:** Fast, simple, works well for most cases  
**⚠️ Cons:** May return very similar/duplicate content


In [None]:
# Basic similarity search with scores
context_docs_with_score = vector_store.similarity_search_with_relevance_scores(
    sample_query, k=TOP_K
)

print("📊 Similarity Search Results:")
for i, (doc, score) in enumerate(context_docs_with_score, 1):
    print(f"{i}. Score: {score:.3f} | Source: {doc.metadata['source']} | ChunkID: {doc.metadata['chunk_id']} ")
    print(f"   Preview: {doc.page_content[:50].strip()}...\n")

## 2. MMR (Maximal Marginal Relevance)

**What it is:**  Balances relevance with diversity to avoid near-duplicate results.

**Formula:** `score = λ * similarity(query, doc) – (1 – λ) * max(similarity(doc, selected_docs))`

**When to use:**
- When documents have repetitive/similar content
- Need diverse perspectives on the same topic
- Quality over quantity approach

**✅ Pros:** Reduces redundancy, increases content diversity  
**⚠️ Cons:** May miss highly relevant but similar content

In [None]:
# MMR search
query_embedding = embeddings.embed_query(sample_query)
mmr_docs_with_score = vector_store.max_marginal_relevance_search_with_score_by_vector(
    embedding=query_embedding, k=TOP_K
)

print("MMR Search Results:\n")
for i, (doc, score) in enumerate(mmr_docs_with_score, 1):
    print(f"{i}. MMR Score: {score:.3f} | Source: {doc.metadata['source']} | ChunkID: {doc.metadata['chunk_id']} ")
    print(f"   Preview: {doc.page_content[:50]}...\n")

In [None]:
print("\nComparison:")
print("Similarity sources:", [doc[0].metadata['chunk_id'] for doc in context_docs_with_score])
print("MMR sources:       ", [doc[0].metadata['chunk_id'] for doc in mmr_docs_with_score])


In [None]:
del context_docs_with_score, query_embedding, mmr_docs_with_score

## 3. Top-k Retrieval with Thresholding (Quantity-Oriented)  

**What it is:**  
Return a fixed number of top-k results, optionally filtering out those below a minimum relevance score.  

**When to use:**  
- When you want a consistent number of results  
- When coverage/quantity is more important than strict quality  
- When downstream processing expects a predictable input size  

**✅ Pros:** Predictable result count, broader coverage  
**⚠️ Cons:** May include less-relevant results if quality varies 

In [None]:
# Score thresholding with fixed k
context_docs_with_score = vector_store.similarity_search_with_relevance_scores(
    sample_query, k=TOP_K
)

retriever_with_threshold = vector_store.as_retriever(
    search_kwargs={"k": TOP_K, "score_threshold": SCORE_THRESHOLD}
)
threshold_docs = retriever_with_threshold.invoke(sample_query)

print(f"📏 Score Threshold Results (min score: {SCORE_THRESHOLD}):")
print(f"Found {len(threshold_docs)} documents above threshold")

# Compare with scores
all_scores = [score for _, score in context_docs_with_score]
above_threshold = [score for score in all_scores if score >= SCORE_THRESHOLD]

print(f"All scores: {all_scores}")
print(f"Above threshold ({SCORE_THRESHOLD}): {above_threshold}")


In [None]:
# Score thresholding with fixed k -  With scores
# Test with different queries to show variability
print("\n📊 Results for different queries:")
for i, query_data in enumerate(test_queries[:3], 1):
    query = query_data["user_input"]
    docs =  vector_store.similarity_search_with_relevance_scores(query, k=TOP_K)
    scores = [round(doc[1],2) for doc in docs]
    print(f"Query {i}: {len(docs)} results |  Scores: {scores} | Topic: {query[:50]}...")

In [None]:
del context_docs_with_score, retriever_with_threshold, threshold_docs, all_scores, above_threshold

## 4. Dynamic (Adaptive) Thresholding (Quality-Oriented)  

**What it is:**  
Return all results above a relevance score threshold, with no fixed k.  

**When to use:**  
- When ensuring only high-quality results is the priority  
- When query difficulty varies widely  
- When you can handle variable-length input for the LLM  

**✅ Pros:** Ensures higher-quality results, filters out noise  
**⚠️ Cons:** Unpredictable result count, can increase costs for long contexts 

In [None]:
# Dynamic retrieval (no fixed k)
dynamic_retriever = vector_store.as_retriever(
    search_kwargs={"score_threshold": SCORE_THRESHOLD}
)
dynamic_docs = dynamic_retriever.invoke(sample_query)

print(f"🔄 Dynamic Retrieval Results:")
print(f"Found {len(dynamic_docs)} documents above threshold {SCORE_THRESHOLD}")

# Test with different queries to show variability
print("\n📊 Results for different queries:")
for i, query_data in enumerate(test_queries[:3], 1):
    query = query_data["user_input"]
    docs = dynamic_retriever.invoke(query)
    print(f"Query {i}: {len(docs)} results | Topic: {query[:50]}...")

In [None]:
del dynamic_retriever, dynamic_docs

## 5. Metadata Filtering

**What it is:**  Filter search results by document metadata (source, type, date, etc.).

**When to use:**
- Domain-specific searches (e.g., only medical papers)
- Time-based filtering (recent documents only)
- Source credibility filtering
- User permission-based access control

**✅ Pros:** Precise targeting, context control  
**⚠️ Cons:** May miss relevant content from filtered sources


In [None]:
from qdrant_client.http import models as rest

qdrant_client.create_payload_index(
    collection_name="unstructured",
    field_name="metadata.source",
    field_schema=rest.PayloadSchemaType.TEXT
)

In [None]:
# Example 1: Exact source match
target_file = "Statistics_2025_Copas-Jackson-type_bounds_for_publication_bias_over_a_general_class_of___selecti.pdf"

exact_filter_retriever = vector_store.as_retriever(
    search_kwargs={
        "k": TOP_K,
        "filter": Filter(
            must=[
                FieldCondition(
                    key="metadata.source",  
                    match=MatchText(text=target_file)
                )
            ]
        )
    }
)

exact_filtered_docs = exact_filter_retriever.invoke(sample_query)
print(f"🎯 Exact Source Filter Results:")
print(f"Target: {target_file[:50]}...")
print(f"Found {len(exact_filtered_docs)} results from this source")

In [None]:
# Example 2: Substring/topic-based filtering
topic_filter_retriever = vector_store.as_retriever(
    search_kwargs={
        "k": TOP_K,
        "filter": Filter(
            must=[
                FieldCondition(
                    key="metadata.source", 
                    match=MatchText(text="Statistics_2025")  # substring match
                )
            ]
        )
    }
)

topic_filtered_docs = topic_filter_retriever.invoke(sample_query)
print(f"\n📊 Topic Filter Results (Statistics papers):")
print(f"Found {len(topic_filtered_docs)} results")
for doc in topic_filtered_docs:
    print(f"- {doc.metadata['source'][:60]}...")

In [None]:
del target_file, exact_filter_retriever, exact_filtered_docs, topic_filter_retriever, topic_filtered_docs

## 6. Document Chunk Linking

**What it is:**  First find relevant documents, then retrieve more chunks from those same documents.

**When to use:**
- When relevant info might be spread across chunks in same document
- Want comprehensive coverage of relevant documents
- Building document-level understanding

**✅ Pros:** Comprehensive document coverage, maintains context  
**⚠️ Cons:** May include less relevant chunks from relevant documents


In [None]:
# Step 1: Find most relevant documents
initial_docs = vector_store.similarity_search(sample_query, k=2)
relevant_sources = [doc.metadata["source"] for doc in initial_docs]

print(f"🔗 Document Chunk Linking:")
print(f"Step 1 - Found relevant documents:")
for source in relevant_sources:
    print(f"- {source[:60]}...")

# Step 2: Get more chunks from these documents
linked_retriever = vector_store.as_retriever(
    search_kwargs={
        "k": TOP_K,
        "filter": Filter(
            should=[  # OR condition across multiple files
                FieldCondition(
                    key="metadata.source",
                    match=MatchText(text=source)
                )
                for source in relevant_sources
            ]
        )
    }
)

linked_docs = linked_retriever.invoke(sample_query)
print(f"\nStep 2 - Retrieved {len(linked_docs)} total chunks from relevant documents")

# Show distribution
source_counts = Counter([doc.metadata['source'] for doc in linked_docs])
for source, count in source_counts.items():
    print(f"- {count} chunks from {source[:50]}...")


In [None]:
del initial_docs, relevant_sources, linked_retriever, linked_docs, source_counts

## 7. LLM-Guided Filtering 

**What it is:**  Use an LLM to classify queries and route to appropriate filtered retrievers.

**When to use:**
- Multi-domain knowledge bases
- Complex query understanding needed
- When simple keyword filtering isn't sufficient
- Domain-specific optimization

**✅ Pros:** Intelligent routing, domain optimization  
**⚠️ Cons:** Added LLM call overhead, potential classification errors


In [None]:
# Setup LLM for query classification
llm = model_manager.get_llm()

TOPIC_IDENTIFICATION_TEMPLATE = PromptTemplate(
    input_variables=["query"],
    template="""
Classify the query into one of two topics:

- "Computer_Vision": if related to image/video processing, recognition, detection, segmentation, or OCR.  
- "Other": for everything else.  

Return only the class name.  

Query: {query}  
Topic:
"""
)

def classify_query(query: str) -> str:
    """Classify query into Computer_Vision or Other."""
    category = llm.invoke(
        TOPIC_IDENTIFICATION_TEMPLATE.format(query=query)
    ).content.strip()
    return category

def get_retriever(category: str, k: int = TOP_K):
    """Return retriever based on classification."""
    if category == "Computer_Vision":
        print("🎯 Using Computer Vision filtered retriever\n")
        return vector_store.as_retriever(
            search_kwargs={
                "k": k,
                "filter": Filter(
                    must=[
                        FieldCondition(
                            key="metadata.source", 
                            match=MatchText(text="Computer_Vision")
                        )
                    ]
                )
            }
        )
    else:
        print("🔍 Using general retriever\n")
        return vector_store.as_retriever(
            search_kwargs={"k": k, "score_threshold": SCORE_THRESHOLD}
        )

# -------------------
# Testing
# -------------------
print("🧪 Testing LLM-Guided Filtering:\n")
llm_filtering_queries = [test_queries[i] for i in [0, 3]]  
for i, query in enumerate(llm_filtering_queries, 1):
    q_text = query["user_input"]
    expected = "Computer_Vision" if "Computer_Vision" in query["source_file"] else "Other"

    print(f"\n📋 Test {i}: {q_text[:150]}")

    predicted = classify_query(q_text)
    print(f"redicted: {predicted} | Expected: {expected}")

    result = "Correctly Classified ✅" if predicted == expected else "Incorrectly Classified ❌"
    print(result)

In [None]:
del llm, TOPIC_IDENTIFICATION_TEMPLATE, llm_filtering_queries

In [None]:
gc.collect()

## 8. Reranking

**What it is:**  
Applies a secondary model (e.g., cross-encoder, LLM, or relevance model) to reorder initial retrieval results, improving relevance ranking beyond similarity scores alone.  

**When to use:**   
- Dense or hybrid retrieval gives many candidates but order matters  
- Need **higher precision** in the top results (e.g., top-3 for RAG context)  
- Queries where subtle semantic nuances are important  

**✅ Pros:**   
- Improves relevance of top results  
- Reduces noise passed to the LLM  
- Works well with hybrid/similarity search as a post-processing step  

**⚠️ Cons:**   
- Higher latency and compute cost  
- Requires additional model training or fine-tuning for best results  
- May not scale well for very large candidate sets  

**Example use case:**  
- Initial retriever returns 20 (>>K) research paper abstracts.  
- A cross-encoder reranker re-scores them, surfacing the **most directly relevant Top-K** for the query context.  


Reranking
<!-- ![Reranking](../utils/images/reranking.png) -->

In [None]:
# Refer reranking.py 
def _setup_reranker_retriever():
    """Initialize the reranking retriever."""
    qdrant_path = config.QDRANT_DIR / strategy.value
    qdrant_client = QdrantClient(path=str(qdrant_path))
    embeddings = model_manager.get_embeddings()

    vector_store = QdrantVectorStore(
        client=qdrant_client,
        collection_name=strategy.value,
        embedding=embeddings,
    )

    retriever = vector_store.as_retriever(search_kwargs={"k": top_k})

    model = HuggingFaceCrossEncoder(model_name=constants.RERANKER_MODEL)
    compressor = CrossEncoderReranker(model=model, top_n=top_n)
    reranker_retriever = ContextualCompressionRetriever(
        base_compressor=compressor, base_retriever=retriever
    )
    return reranker_retriever

In [None]:
from retrieval_playground.src.mid_retrieval.reranking import Reranker
reranker = Reranker(qdrant_client=qdrant_client, top_n=TOP_K)
reranker_evaluation_results = reranker.evaluate_reranking()

In [None]:
del reranker, reranker_evaluation_results

## 9. Hybrid Filtering

**What it is:**   
Combines sparse retrieval (BM25/keyword-based) with dense retrieval (embeddings-based) to leverage both exact keyword matching and semantic similarity.

**How it works:**  
- BM25 ensures keyword precision (great for rare terms, acronyms, or exact matches).  
- Dense retrieval ensures semantic recall (captures meaning even if words differ).  
- Final results are combined (via weighted scores, reranking, or union).  

**When to use:**  
- Queries with a mix of rare keywords and semantic intent  
- Domain-specific content with technical jargon (BM25 helps catch exact terms)  
- General RAG pipelines where coverage + precision both matter  
- When neither sparse nor dense alone gives consistently good results  

**✅ Pros:** Best of both worlds – exact match + semantic understanding  
**⚠️ Cons:** More complex to implement, higher compute cost  

**Example use case:**  

*Query:*  
*“What are the key challenges in adapting existing open-vocabulary semantic segmentation (OVSS) frameworks, designed for natural images, to remote sensing images, and how does SegEarth-OV address these challenges?”*  

*BM25 (Sparse)*: Catches exact terms - *“semantic segmentation”*, *“open-vocabulary”*, *“remote sensing”*, *OVSS*  

*Dense (Semantic)*: Finds paraphrases - *“pixel-level classification for satellite imagery”*, *“domain adaptation from natural to aerial scenes”*, *“generalized segmentation across modalities”*  

*Hybrid*: Returns both - keyword-heavy matches (*“open-vocabulary segmentation for remote sensing”*) + semantic ones (*“SegEarth-OV enables cross-domain satellite segmentation”*).  

*Result:* Covers **precision (keywords)** + **recall (semantic similarity)**.  


Hybrid
<!-- ![Hybrid Approach](../utils/images/hybrid_search.png) -->

To learn more about implementing hybrid approaches, check out [this article on hybrid search by Qdrant](https://qdrant.tech/articles/hybrid-search/).


## 📋 Method Comparison & Recommendations

### Quick Comparison Table

| Method | Best For | Pros | Cons | Complexity |
|--------|----------|------|------|------------|
| **Similarity Search** | General purpose | Fast, simple, reliable | May return duplicates | Low |
| **MMR** | Diverse content | Reduces redundancy | May miss similar relevant content | Medium |
| **Score Threshold** | Quality control | Ensures minimum quality | May return no results | Low |
| **Dynamic Retrieval** | Variable content needs | Flexible, quality-focused | Unpredictable context length | Low |
| **Metadata Filtering** | Domain-specific | Precise targeting | May miss cross-domain insights | Medium |
| **Chunk Linking** | Document comprehension | Comprehensive coverage | Less relevant chunks included | Medium |
| **LLM-Guided** | Multi-domain systems | Intelligent routing | LLM overhead, classification errors | High |
| **Hybrid Retrieval (BM25 + Dense)** | Balanced semantic + keyword search | Captures both exact matches & semantic meaning | Requires tuning weight between BM25 & dense | High |
| **Reranking** | Precision in top results | Improves top-k relevance, reduces noise | Higher latency & compute cost | High |

---

### 🎯 Recommendations

#### **Start Simple**
Begin with **Similarity Search + Score Threshold** for most applications.

#### **Scale Up Based on Needs**
- Add **MMR** if you notice repetitive results  
- Use **Metadata Filtering** for multi-domain knowledge bases  
- Implement **LLM-Guided Filtering** for complex routing needs  
- Adopt **Hybrid Retrieval (BM25 + Dense)** when queries require both exact keyword matches (e.g., specific terms, codes) and semantic understanding (contextual intent)  
- Add **Reranking** when you need **high precision in the top results** (e.g., top-3 context for LLMs)  

#### **Fine-tuning Tips**
- Adjust `score_threshold` based on your quality requirements (0.3–0.5 is typical)  
- Use **Dynamic Retrieval** when context length flexibility is valuable  
- Apply **Chunk Linking** for document-centric tasks  
- Tune the BM25 vs Dense weighting in **Hybrid Retrieval** (common ranges: 0.3–0.7) depending on whether precision (keywords) or recall (semantics) is more important  
- Use **Reranking** selectively on top-N candidates to balance cost vs accuracy  

#### **Pro Tips**
- Monitor retrieval metrics (precision, recall) to choose optimal methods  
- Consider combining methods (e.g., MMR + Score Threshold, Hybrid + Reranking)  
- Cache LLM classification results for repeated query patterns  
- A/B test different approaches with your specific use case  
- Start with simple methods and add complexity only when needed  

#### **Common Patterns**
- **High-quality RAG**: Similarity Search + Score Threshold + MMR  
- **Multi-domain KB**: LLM-Guided Filtering + Metadata Filtering  
- **Document Analysis**: Chunk Linking + Score Threshold  
- **Exploratory Search**: Dynamic Retrieval + MMR  
- **Balanced Search (keyword + meaning)**: Hybrid Retrieval (BM25 + Dense) + Score Threshold  
- **High-precision Context**: Hybrid Retrieval + Reranking  

#### ✅ Tutorial completed! You now know 9 different retrieval methods.
Ready to build better RAG systems!

Next steps:
- Experiment with different methods on your own data
- Combine methods for optimal results
- Monitor and evaluate retrieval quality
- Scale complexity based on your specific needs
