# Demo #2: Multi-Query and Sub-Query Decomposition

## Overview

This notebook demonstrates advanced query handling techniques in RAG systems by implementing:
- **Sub-query decomposition**: Breaking complex questions into smaller, independent sub-queries
- **Multi-query generation**: Creating multiple query variations from different perspectives
- **Multi-hop reasoning**: Enabling comprehensive information gathering across interconnected topics

### Why Query Decomposition Matters

Complex queries often contain multiple facets or implicit sub-questions. A single retrieval pass may miss critical information because:
- The query is too broad to match specific documents effectively
- Different parts of the answer exist in separate documents
- The query requires synthesizing information from multiple sources

By decomposing complex queries into simpler sub-queries, we can:
- Improve retrieval recall (find more relevant documents)
- Enable multi-hop reasoning (connect information across documents)
- Generate more comprehensive and accurate answers

### Learning Objectives

By the end of this notebook, you will understand:
1. How to detect and decompose complex queries
2. How to execute parallel retrieval for multiple sub-queries
3. How to aggregate and organize multi-source contexts
4. Advanced techniques like Diverse Multi-Query Rewriting (DMQR-RAG)

## 1. Environment Setup and Installation

In [None]:
# Install required packages
!pip install -q langchain langchain-openai langchain-community langchain-chroma
!pip install -q openai chromadb sentence-transformers tiktoken

In [None]:
# Import required libraries
import os
from typing import List, Dict, Tuple, Any
import warnings
warnings.filterwarnings('ignore')

# LangChain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.schema import Document

# Other imports
from collections import defaultdict
import json
from datetime import datetime

print("✓ All libraries imported successfully")

In [None]:
# Configure OpenAI API key
# Option 1: Set as environment variable (recommended)
# os.environ["OPENAI_API_KEY"] = "your-api-key-here"

# Option 2: Load from file (for workshop)
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

print("✓ API key configured")

## 2. Create Sample Knowledge Base

We'll create a knowledge base with interconnected information about smartphones. This will require multi-hop reasoning to answer complex queries.

In [None]:
# Sample documents about smartphones - designed to require multi-hop reasoning
sample_documents = [
    # iPhone 15 Pro information
    """The iPhone 15 Pro features Apple's A17 Pro chip with a 3nm process, making it incredibly 
    energy efficient. The device includes a 48MP main camera with advanced computational photography 
    capabilities. Apple has introduced a new Action Button replacing the traditional mute switch.""",
    
    """iPhone 15 Pro battery life is rated at up to 23 hours of video playback. The device supports 
    fast charging up to 20W and MagSafe wireless charging at 15W. The battery capacity is approximately 
    3,274 mAh for the standard Pro model.""",
    
    """The iPhone 15 Pro camera system includes a 48MP main camera, 12MP ultra-wide, and a 12MP 
    telephoto with 3x optical zoom. The main camera can capture ProRAW images and supports 4K video 
    at 60fps with ProRes recording capabilities.""",
    
    # Samsung Galaxy S24 Ultra information
    """Samsung Galaxy S24 Ultra is powered by the Snapdragon 8 Gen 3 processor in most markets, 
    built on a 4nm process. The device features Samsung's largest battery in an S-series phone 
    and includes an integrated S Pen for productivity.""",
    
    """The Galaxy S24 Ultra offers exceptional battery life with its 5,000 mAh battery, providing 
    up to 28 hours of video playback. It supports 45W fast charging, 15W wireless charging, and 
    4.5W reverse wireless charging for accessories.""",
    
    """Samsung Galaxy S24 Ultra features a quad camera system with a 200MP main sensor, 12MP ultra-wide, 
    10MP 3x telephoto, and 50MP 5x telephoto. The 200MP sensor uses pixel binning to produce 12MP images 
    with exceptional detail. It can record 8K video at 30fps.""",
    
    # Google Pixel 8 Pro information
    """Google Pixel 8 Pro uses Google's custom Tensor G3 chip, optimized for AI and machine learning 
    tasks. The device features a 6.7-inch LTPO OLED display with adaptive refresh rate up to 120Hz. 
    Google emphasizes computational photography over raw hardware specs.""",
    
    """The Pixel 8 Pro has a 5,050 mAh battery with impressive battery life, achieving up to 24 hours 
    of mixed use. It supports 30W fast charging and 23W wireless charging. The Extreme Battery Saver 
    mode can extend battery life up to 72 hours.""",
    
    """Pixel 8 Pro camera system includes a 50MP main sensor, 48MP ultra-wide, and 48MP telephoto with 
    5x optical zoom. Google's computational photography features include Magic Eraser, Photo Unblur, 
    and Night Sight. The camera can capture 4K video at 60fps.""",
    
    # Comparative information
    """When comparing flagship smartphones in 2024, key factors include processor efficiency, camera 
    versatility, battery capacity, and software ecosystem. The iPhone 15 Pro excels in ecosystem 
    integration, the Galaxy S24 Ultra leads in raw specifications, and the Pixel 8 Pro stands out 
    for AI-powered features.""",
    
    """Battery life in modern smartphones depends not just on battery capacity (mAh) but also on 
    processor efficiency, display technology, and software optimization. A phone with a smaller 
    battery but more efficient processor can often outlast one with a larger battery.""",
    
    """Camera quality in smartphones is determined by multiple factors: sensor size, pixel count, 
    lens quality, optical image stabilization, and computational photography algorithms. Higher 
    megapixels don't always mean better photos - sensor size and processing capabilities matter more.""",
]

print(f"Created knowledge base with {len(sample_documents)} documents")
print(f"Total characters: {sum(len(doc) for doc in sample_documents):,}")

## 3. Initialize Embedding Model and Vector Store

In [None]:
# Initialize embedding model
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'}
)

print("✓ Embedding model initialized")
print(f"  Model: sentence-transformers/all-MiniLM-L6-v2")
print(f"  Embedding dimension: 384")

In [None]:
# Create text splitter for chunking
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n", ". ", " ", ""]
)

# Convert documents to LangChain Document objects and split
docs = [Document(page_content=doc) for doc in sample_documents]
split_docs = text_splitter.split_documents(docs)

print(f"✓ Documents split into {len(split_docs)} chunks")
print(f"  Average chunk size: {sum(len(doc.page_content) for doc in split_docs) // len(split_docs)} characters")

In [None]:
# Create vector store
vectorstore = Chroma.from_documents(
    documents=split_docs,
    embedding=embedding_model,
    collection_name="smartphone_knowledge"
)

print("✓ Vector store created and indexed")
print(f"  Total chunks indexed: {vectorstore._collection.count()}")

## 4. Initialize LLM for Query Processing and Generation

In [None]:
# Initialize OpenAI LLM
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.3,
    max_tokens=2000
)

print("✓ LLM initialized")
print(f"  Model: gpt-4o-mini")
print(f"  Temperature: 0.3 (focused, consistent responses)")

## 5. Baseline: Naive Single-Query RAG

First, let's implement a baseline naive RAG system that performs a single retrieval for comparison.

In [None]:
def naive_rag(query: str, k: int = 5) -> Dict[str, Any]:
    """
    Baseline naive RAG: single-query retrieval and generation.
    
    Args:
        query: User's question
        k: Number of documents to retrieve
    
    Returns:
        Dictionary containing answer and metadata
    """
    # Retrieve relevant documents
    retrieved_docs = vectorstore.similarity_search(query, k=k)
    
    # Prepare context from retrieved documents
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    
    # Create prompt for generation
    prompt = f"""Using the following context, please answer the question. If the context doesn't 
contain enough information to answer fully, acknowledge this in your response.

Context:
{context}

Question: {query}

Answer:"""
    
    # Generate answer
    response = llm.invoke(prompt)
    
    return {
        "query": query,
        "answer": response.content,
        "retrieved_docs": [doc.page_content for doc in retrieved_docs],
        "num_docs": len(retrieved_docs)
    }

print("✓ Naive RAG function defined")

## 6. Sub-Query Decomposition Implementation

Now let's implement the core query decomposition logic.

In [None]:
# Define prompt template for query decomposition
DECOMPOSITION_TEMPLATE = """You are an expert at breaking down complex questions into simpler, 
independent sub-questions.

Your task is to analyze the given question and decompose it into a set of simpler sub-questions that, 
when answered together, would provide a comprehensive answer to the original question.

Guidelines:
1. Each sub-question should be independent and answerable on its own
2. Sub-questions should be specific and focused
3. Cover all aspects of the original question
4. Aim for 2-5 sub-questions depending on complexity
5. Return ONLY a valid JSON array of sub-questions, nothing else

Original Question: {question}

Sub-questions (as JSON array):"""

decomposition_prompt = PromptTemplate(
    template=DECOMPOSITION_TEMPLATE,
    input_variables=["question"]
)

print("✓ Decomposition prompt template created")

In [None]:
def decompose_query(query: str) -> List[str]:
    """
    Decompose a complex query into simpler sub-queries.
    
    Args:
        query: Complex user question
    
    Returns:
        List of sub-questions
    """
    # Format prompt
    prompt = decomposition_prompt.format(question=query)
    
    # Get LLM response
    response = llm.invoke(prompt)
    
    # Parse JSON response
    try:
        # Extract JSON from response (handle markdown code blocks if present)
        response_text = response.content.strip()
        if "```json" in response_text:
            response_text = response_text.split("```json")[1].split("```")[0]
        elif "```" in response_text:
            response_text = response_text.split("```")[1].split("```")[0]
        
        sub_queries = json.loads(response_text)
        
        # Validate that we got a list
        if not isinstance(sub_queries, list):
            raise ValueError("Response is not a list")
        
        return sub_queries
    
    except (json.JSONDecodeError, ValueError) as e:
        print(f"Warning: Failed to parse sub-queries: {e}")
        print(f"Response was: {response.content}")
        # Fallback: return original query
        return [query]

print("✓ Query decomposition function defined")

## 7. Multi-Query Retrieval Pipeline

Implement parallel retrieval for multiple sub-queries with result aggregation.

In [None]:
def retrieve_for_subqueries(
    sub_queries: List[str], 
    k: int = 3
) -> Dict[str, List[Document]]:
    """
    Retrieve documents for each sub-query independently.
    
    Args:
        sub_queries: List of sub-questions
        k: Number of documents to retrieve per sub-query
    
    Returns:
        Dictionary mapping each sub-query to its retrieved documents
    """
    results = {}
    
    for sub_query in sub_queries:
        # Retrieve documents for this sub-query
        docs = vectorstore.similarity_search(sub_query, k=k)
        results[sub_query] = docs
    
    return results

print("✓ Multi-query retrieval function defined")

In [None]:
def deduplicate_documents(doc_dict: Dict[str, List[Document]]) -> List[Tuple[str, Document]]:
    """
    Deduplicate retrieved documents while maintaining sub-query associations.
    
    Args:
        doc_dict: Dictionary mapping sub-queries to retrieved documents
    
    Returns:
        List of (sub_query, document) tuples with duplicates removed
    """
    seen_content = set()
    deduplicated = []
    
    for sub_query, docs in doc_dict.items():
        for doc in docs:
            # Use content as deduplication key
            content_key = doc.page_content.strip()
            
            if content_key not in seen_content:
                seen_content.add(content_key)
                deduplicated.append((sub_query, doc))
    
    return deduplicated

print("✓ Document deduplication function defined")

## 8. Context Synthesis and Answer Generation

In [None]:
def synthesize_answer(
    original_query: str,
    sub_queries: List[str],
    retrieved_docs: List[Tuple[str, Document]]
) -> str:
    """
    Synthesize final answer from sub-query contexts.
    
    Args:
        original_query: Original complex question
        sub_queries: List of sub-questions
        retrieved_docs: List of (sub_query, document) tuples
    
    Returns:
        Final synthesized answer
    """
    # Organize context by sub-query
    context_by_subquery = defaultdict(list)
    for sub_query, doc in retrieved_docs:
        context_by_subquery[sub_query].append(doc.page_content)
    
    # Build structured context string
    context_sections = []
    for i, sub_query in enumerate(sub_queries, 1):
        if sub_query in context_by_subquery:
            docs_text = "\n".join(context_by_subquery[sub_query])
            context_sections.append(f"""Sub-question {i}: {sub_query}
Relevant Information:
{docs_text}""")
    
    full_context = "\n\n" + "="*80 + "\n\n".join(context_sections)
    
    # Create synthesis prompt
    synthesis_prompt = f"""You are answering a complex question that has been broken down into 
sub-questions. Below you will find each sub-question along with relevant information retrieved 
for that sub-question.

Your task is to synthesize a comprehensive answer to the original question by integrating 
information from all sub-questions.

Guidelines:
1. Provide a complete, well-structured answer to the original question
2. Integrate information from all sub-questions cohesively
3. If information is missing or contradictory, acknowledge this
4. Be specific and cite key details from the context

Original Question: {original_query}

Sub-questions and Retrieved Information:
{full_context}

Comprehensive Answer:"""
    
    # Generate final answer
    response = llm.invoke(synthesis_prompt)
    return response.content

print("✓ Answer synthesis function defined")

## 9. Complete Multi-Query RAG Pipeline

In [None]:
def multi_query_rag(
    query: str,
    k_per_subquery: int = 3,
    verbose: bool = True
) -> Dict[str, Any]:
    """
    Complete multi-query RAG pipeline with sub-query decomposition.
    
    Args:
        query: Complex user question
        k_per_subquery: Number of documents to retrieve per sub-query
        verbose: Whether to print intermediate steps
    
    Returns:
        Dictionary containing answer and metadata
    """
    if verbose:
        print("="*80)
        print("MULTI-QUERY RAG PIPELINE")
        print("="*80)
        print(f"\nOriginal Query: {query}\n")
    
    # Step 1: Decompose query into sub-queries
    if verbose:
        print("Step 1: Decomposing query into sub-questions...")
    
    sub_queries = decompose_query(query)
    
    if verbose:
        print(f"\nGenerated {len(sub_queries)} sub-questions:")
        for i, sq in enumerate(sub_queries, 1):
            print(f"  {i}. {sq}")
        print()
    
    # Step 2: Retrieve documents for each sub-query
    if verbose:
        print("Step 2: Retrieving documents for each sub-question...")
    
    retrieved_dict = retrieve_for_subqueries(sub_queries, k=k_per_subquery)
    
    if verbose:
        total_before = sum(len(docs) for docs in retrieved_dict.values())
        print(f"  Retrieved {total_before} documents (before deduplication)\n")
    
    # Step 3: Deduplicate documents
    if verbose:
        print("Step 3: Deduplicating retrieved documents...")
    
    deduplicated_docs = deduplicate_documents(retrieved_dict)
    
    if verbose:
        print(f"  {len(deduplicated_docs)} unique documents after deduplication\n")
    
    # Step 4: Synthesize final answer
    if verbose:
        print("Step 4: Synthesizing comprehensive answer...\n")
    
    answer = synthesize_answer(query, sub_queries, deduplicated_docs)
    
    if verbose:
        print("="*80)
        print("FINAL ANSWER")
        print("="*80)
        print(answer)
        print("="*80)
    
    return {
        "query": query,
        "sub_queries": sub_queries,
        "answer": answer,
        "num_subqueries": len(sub_queries),
        "num_unique_docs": len(deduplicated_docs),
        "retrieved_docs": [(sq, doc.page_content) for sq, doc in deduplicated_docs]
    }

print("✓ Complete multi-query RAG pipeline defined")

## 10. Comparative Evaluation: Naive vs. Multi-Query RAG

Let's test both approaches on complex queries that require multi-hop reasoning.

In [None]:
# Define test queries with increasing complexity
test_queries = [
    "What are the differences in battery life and camera quality between the iPhone 15 Pro and Samsung Galaxy S24 Ultra?",
    "Compare the processing power, charging capabilities, and photography features across iPhone 15 Pro, Galaxy S24 Ultra, and Pixel 8 Pro",
    "Which flagship phone would be best for a photographer who needs long battery life and fast charging?"
]

print(f"Prepared {len(test_queries)} test queries for evaluation")

### Test Query 1: Two-Aspect Comparison

In [None]:
query_1 = test_queries[0]
print("\n" + "#"*80)
print("TEST QUERY 1")
print("#"*80)
print(f"Query: {query_1}\n")

In [None]:
# Naive RAG approach
print("\n--- NAIVE RAG APPROACH ---\n")
naive_result_1 = naive_rag(query_1, k=5)

print(f"Retrieved {naive_result_1['num_docs']} documents\n")
print("Answer:")
print(naive_result_1['answer'])
print("\n" + "-"*80)

In [None]:
# Multi-Query RAG approach
print("\n--- MULTI-QUERY RAG APPROACH ---\n")
multi_result_1 = multi_query_rag(query_1, k_per_subquery=3, verbose=True)

### Test Query 2: Multi-Device, Multi-Aspect Comparison

In [None]:
query_2 = test_queries[1]
print("\n" + "#"*80)
print("TEST QUERY 2")
print("#"*80)
print(f"Query: {query_2}\n")

In [None]:
# Naive RAG approach
print("\n--- NAIVE RAG APPROACH ---\n")
naive_result_2 = naive_rag(query_2, k=5)

print(f"Retrieved {naive_result_2['num_docs']} documents\n")
print("Answer:")
print(naive_result_2['answer'])
print("\n" + "-"*80)

In [None]:
# Multi-Query RAG approach
print("\n--- MULTI-QUERY RAG APPROACH ---\n")
multi_result_2 = multi_query_rag(query_2, k_per_subquery=3, verbose=True)

### Test Query 3: Recommendation with Multiple Criteria

In [None]:
query_3 = test_queries[2]
print("\n" + "#"*80)
print("TEST QUERY 3")
print("#"*80)
print(f"Query: {query_3}\n")

In [None]:
# Naive RAG approach
print("\n--- NAIVE RAG APPROACH ---\n")
naive_result_3 = naive_rag(query_3, k=5)

print(f"Retrieved {naive_result_3['num_docs']} documents\n")
print("Answer:")
print(naive_result_3['answer'])
print("\n" + "-"*80)

In [None]:
# Multi-Query RAG approach
print("\n--- MULTI-QUERY RAG APPROACH ---\n")
multi_result_3 = multi_query_rag(query_3, k_per_subquery=3, verbose=True)

## 11. Advanced Technique: Diverse Multi-Query Rewriting (DMQR-RAG)

DMQR-RAG generates diverse query variations at different information granularity levels to improve retrieval coverage.

In [None]:
# Define DMQR prompt template
DMQR_TEMPLATE = """You are an expert at generating diverse query variations to improve information retrieval.

Your task is to rewrite the given question in multiple ways, each focusing on a different aspect or 
granularity level:

1. HIGH-LEVEL: A broad, conceptual version of the question
2. SPECIFIC: A detailed, technical version with specific terms
3. ALTERNATIVE PHRASING: Same meaning, completely different wording
4. DOMAIN-SPECIFIC: Using domain-specific terminology and concepts

Each variation should capture the same underlying information need but from a different angle.

Return ONLY a valid JSON array with exactly 4 query variations, nothing else.

Original Question: {question}

Diverse Query Variations (as JSON array):"""

dmqr_prompt = PromptTemplate(
    template=DMQR_TEMPLATE,
    input_variables=["question"]
)

print("✓ DMQR prompt template created")

In [None]:
def dmqr_generate_queries(query: str) -> List[str]:
    """
    Generate diverse multi-query rewrites using DMQR approach.
    
    Args:
        query: Original user question
    
    Returns:
        List of diverse query variations
    """
    # Format prompt
    prompt = dmqr_prompt.format(question=query)
    
    # Get LLM response
    response = llm.invoke(prompt)
    
    # Parse JSON response
    try:
        response_text = response.content.strip()
        if "```json" in response_text:
            response_text = response_text.split("```json")[1].split("```")[0]
        elif "```" in response_text:
            response_text = response_text.split("```")[1].split("```")[0]
        
        query_variations = json.loads(response_text)
        
        if not isinstance(query_variations, list):
            raise ValueError("Response is not a list")
        
        # Include original query as well
        return [query] + query_variations
    
    except (json.JSONDecodeError, ValueError) as e:
        print(f"Warning: Failed to parse DMQR queries: {e}")
        print(f"Response was: {response.content}")
        return [query]

print("✓ DMQR query generation function defined")

In [None]:
def dmqr_rag(
    query: str,
    k_per_variation: int = 2,
    verbose: bool = True
) -> Dict[str, Any]:
    """
    DMQR-RAG: Diverse Multi-Query Rewriting for RAG.
    
    Args:
        query: User question
        k_per_variation: Number of documents to retrieve per variation
        verbose: Whether to print intermediate steps
    
    Returns:
        Dictionary containing answer and metadata
    """
    if verbose:
        print("="*80)
        print("DMQR-RAG PIPELINE")
        print("="*80)
        print(f"\nOriginal Query: {query}\n")
    
    # Generate diverse query variations
    if verbose:
        print("Step 1: Generating diverse query variations...")
    
    query_variations = dmqr_generate_queries(query)
    
    if verbose:
        print(f"\nGenerated {len(query_variations)} query variations:")
        for i, qv in enumerate(query_variations, 1):
            print(f"  {i}. {qv}")
        print()
    
    # Retrieve documents for each variation
    if verbose:
        print("Step 2: Retrieving documents for each variation...")
    
    all_docs = []
    seen_content = set()
    
    for variation in query_variations:
        docs = vectorstore.similarity_search(variation, k=k_per_variation)
        
        # Deduplicate on the fly
        for doc in docs:
            content_key = doc.page_content.strip()
            if content_key not in seen_content:
                seen_content.add(content_key)
                all_docs.append(doc)
    
    if verbose:
        print(f"  Retrieved {len(all_docs)} unique documents\n")
    
    # Prepare context
    context = "\n\n".join([doc.page_content for doc in all_docs])
    
    # Generate answer
    if verbose:
        print("Step 3: Generating comprehensive answer...\n")
    
    prompt = f"""Using the following context, please answer the question comprehensively.

Context:
{context}

Question: {query}

Answer:"""
    
    response = llm.invoke(prompt)
    
    if verbose:
        print("="*80)
        print("FINAL ANSWER")
        print("="*80)
        print(response.content)
        print("="*80)
    
    return {
        "query": query,
        "query_variations": query_variations,
        "answer": response.content,
        "num_variations": len(query_variations),
        "num_unique_docs": len(all_docs)
    }

print("✓ DMQR-RAG pipeline defined")

### Test DMQR-RAG

In [None]:
# Test DMQR-RAG on a complex query
test_query = "What makes a smartphone camera system high quality?"

dmqr_result = dmqr_rag(test_query, k_per_variation=2, verbose=True)

## 12. Analysis and Insights

### Key Observations

**1. Retrieval Coverage:**
- **Naive RAG**: Performs a single retrieval pass with k=5 documents, which may miss information if the query is complex or poorly phrased
- **Multi-Query RAG**: Decomposes complex queries into sub-questions, retrieving k documents per sub-question, then deduplicating. This significantly increases the likelihood of finding all relevant information
- **DMQR-RAG**: Generates diverse query variations at different granularity levels, casting a wider net to capture documents that might be missed by a single query formulation

**2. Answer Quality:**
- **Naive RAG**: Can provide good answers for simple queries, but struggles with multi-faceted questions or queries requiring information synthesis from multiple documents
- **Multi-Query RAG**: Excels at complex queries by ensuring all aspects are addressed. The structured context organization helps the LLM generate more comprehensive answers
- **DMQR-RAG**: Particularly effective when the knowledge base uses varied terminology or when queries can be interpreted at different abstraction levels

**3. Trade-offs:**
- **Latency**: Multi-query approaches require additional LLM calls (for decomposition/rewriting) and multiple retrieval operations, increasing response time
- **Cost**: More LLM API calls mean higher costs, especially with proprietary models
- **Complexity**: Increased system complexity requires more sophisticated error handling and monitoring

### When to Use Each Approach

**Use Naive RAG when:**
- Queries are simple and well-defined
- Latency is critical
- Cost optimization is a priority
- Knowledge base is small and well-structured

**Use Multi-Query RAG when:**
- Queries are inherently complex with multiple facets
- Answer completeness is more important than speed
- Knowledge base is large with information spread across documents
- Application requires multi-hop reasoning

**Use DMQR-RAG when:**
- Knowledge base uses inconsistent or varied terminology
- Queries can be interpreted at different abstraction levels
- Maximum retrieval recall is critical
- Domain expertise varies among users (some technical, some not)

## 13. Production Considerations

### Optimization Strategies

1. **Adaptive Query Decomposition**: Not all queries need decomposition. Implement a classifier to detect query complexity and route simple queries to naive RAG

2. **Parallel Processing**: Execute sub-query retrievals in parallel using async/await or threading to reduce latency

3. **Caching**: Cache decomposition results and embeddings for frequently asked queries

4. **Smart Deduplication**: Use semantic similarity for deduplication instead of exact content matching to catch near-duplicates

5. **Result Ranking**: Implement scoring mechanisms (e.g., Reciprocal Rank Fusion) to prioritize the most relevant documents across sub-queries

In [None]:
# Example: Simple query complexity classifier
def classify_query_complexity(query: str) -> str:
    """
    Classify query complexity to determine which RAG approach to use.
    
    Args:
        query: User question
    
    Returns:
        'simple' or 'complex'
    """
    complexity_indicators = [
        'compare', 'difference', 'versus', 'vs', 'and',
        'both', 'all', 'multiple', 'various', 'different'
    ]
    
    query_lower = query.lower()
    
    # Check for complexity indicators
    indicator_count = sum(1 for indicator in complexity_indicators if indicator in query_lower)
    
    # Check query length
    word_count = len(query.split())
    
    # Simple heuristic: complex if multiple indicators or very long query
    if indicator_count >= 2 or word_count > 15:
        return 'complex'
    else:
        return 'simple'

# Test the classifier
test_queries_classify = [
    "What is the battery life of iPhone 15 Pro?",
    "Compare battery life and camera quality between iPhone and Samsung",
    "Which phone has the best camera?"
]

print("Query Complexity Classification:")
print("="*80)
for tq in test_queries_classify:
    classification = classify_query_complexity(tq)
    print(f"Query: {tq}")
    print(f"Classification: {classification}")
    print("-"*80)

## 14. Key Takeaways

1. **Query decomposition is essential for complex queries**: Breaking down multi-faceted questions into simpler sub-questions dramatically improves retrieval quality and answer comprehensiveness

2. **Sub-queries should be independent**: Each sub-question should be answerable on its own to enable parallel retrieval and avoid sequential dependencies

3. **Deduplication is critical**: Multiple sub-queries will often retrieve overlapping documents, so efficient deduplication prevents redundant context and token waste

4. **Context organization matters**: Structuring retrieved contexts by sub-query helps the LLM understand which information addresses which aspect of the original question

5. **Diverse query rewriting (DMQR) expands coverage**: Generating queries at different granularity levels captures documents that single-perspective queries might miss

6. **Trade-offs exist**: Multi-query approaches improve quality but increase latency and cost - choose based on your application's requirements

7. **Adaptive routing is key for production**: Not all queries need complex decomposition - simple queries should be handled efficiently with naive RAG

## 15. Next Steps and Further Exploration

To extend this demo:

1. **Implement dependency detection**: Analyze sub-queries to identify dependencies and execute them sequentially when needed

2. **Add Reciprocal Rank Fusion (RRF)**: Implement sophisticated ranking algorithms to score and prioritize documents across sub-queries

3. **Experiment with different LLMs**: Test decomposition quality with different models (GPT-4, Claude, open-source models)

4. **Build evaluation metrics**: Implement automated evaluation to measure retrieval recall, precision, and answer quality

5. **Integrate with hybrid search**: Combine multi-query decomposition with hybrid search (dense + sparse vectors) for even better retrieval

6. **Add query intent classification**: Build a classifier to route queries to the most appropriate decomposition strategy

---

**End of Demo #2**