# Demo #2: Multi-Query Decomposition - Complex Query Handling

## Overview

This demo demonstrates **Multi-Query Decomposition**, a technique that breaks down complex, multi-faceted queries into simpler sub-queries to enable comprehensive information retrieval and synthesis.

### Core Concept

Traditional RAG systems treat each query as a single unit. However, many real-world questions are complex and require information from multiple sources or perspectives. A single retrieval pass may miss important context.

**Multi-Query Decomposition solves this by:**
1. Analyzing the complex query to identify sub-questions
2. Decomposing it into multiple simple sub-queries
3. Executing each sub-query independently (parallel retrieval)
4. Aggregating all retrieved contexts
5. Synthesizing a comprehensive answer from the combined information

### Key Benefits
- Better coverage for multi-hop reasoning questions
- More comprehensive answers by retrieving from multiple perspectives
- Explicit sub-question generation aids interpretability
- Enables comparison and synthesis of different concepts

### Citation
- **Reference**: "Build Advanced Retrieval-Augmented Generation Systems" - Microsoft Learn
- **Paper**: "MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries"
  - Link: https://hf.co/papers/2401.15391
- **Paper**: "BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge"
  - Link: https://hf.co/papers/2406.19820

## 1. Setup and Installation

In [None]:
# Install required packages
# !pip install llama-index llama-index-llms-azure-openai llama-index-embeddings-azure-openai python-dotenv

In [None]:
import os
from dotenv import load_dotenv
from pathlib import Path

# llama-index core imports
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    Settings,
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# Azure OpenAI imports
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

print("✓ All imports successful")

## 2. Configure Azure OpenAI

In [None]:
# Load environment variables
load_dotenv()

# Configure Azure OpenAI LLM
azure_llm = AzureOpenAI(
    model="gpt-4",
    deployment_name=os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    temperature=0.1,
)

# Configure Azure OpenAI Embeddings
azure_embed = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
)

# Set global defaults
Settings.llm = azure_llm
Settings.embed_model = azure_embed
Settings.chunk_size = 512
Settings.chunk_overlap = 50

print("✓ Azure OpenAI configured successfully")
print(f"  LLM: {os.getenv('AZURE_OPENAI_DEPLOYMENT_NAME')}")
print(f"  Embeddings: {os.getenv('AZURE_OPENAI_EMBEDDING_DEPLOYMENT')}")

## 3. Load Documents

We'll load multiple ML algorithm documents to create a diverse knowledge base suitable for comparative questions.

In [None]:
# Define data path
data_path = "../RAG_v2/data/ml_concepts"

# Load documents
reader = SimpleDirectoryReader(data_path)
documents = reader.load_data()

print(f"✓ Loaded {len(documents)} documents:")
for i, doc in enumerate(documents, 1):
    filename = Path(doc.metadata['file_path']).name
    print(f"  {i}. {filename}")

## 4. Chunk and Index Documents

In [None]:
# Initialize sentence splitter
splitter = SentenceSplitter(
    chunk_size=512,
    chunk_overlap=50,
)

# Parse documents into nodes
nodes = splitter.get_nodes_from_documents(documents)

# Create vector store index
index = VectorStoreIndex(nodes, embed_model=azure_embed)

print(f"✓ Created {len(nodes)} chunks")
print(f"✓ Vector index built successfully")

## 5. Create Baseline Single-Query Engine

First, let's establish a baseline with a standard query engine.

In [None]:
# Create baseline query engine
baseline_engine = index.as_query_engine(
    similarity_top_k=3,
    llm=azure_llm,
)

print("✓ Baseline query engine created")
print("  Configuration: Single-pass retrieval with top-3 chunks")

## 6. Test Baseline with Complex Query

Let's test with a complex, multi-faceted query that requires synthesizing information from multiple sources.

In [None]:
# Define complex test query that requires comparison
complex_query = "Compare the strengths and weaknesses of gradient boosting and random forests for classification tasks. Which is more prone to overfitting?"

print("Complex Test Query:")
print(f"  {complex_query}")
print("\n" + "="*80)

# Query baseline engine
baseline_response = baseline_engine.query(complex_query)

print("\n🔍 BASELINE SINGLE-QUERY RETRIEVAL")
print("="*80)
print(f"\nGenerated Answer:\n{baseline_response.response}")

# Display retrieved sources
print("\n" + "-"*80)
print("Retrieved Chunks:")
print("-"*80)
for i, node in enumerate(baseline_response.source_nodes, 1):
    print(f"\nChunk {i}:")
    print(f"  Source: {Path(node.metadata.get('file_path', 'Unknown')).name}")
    print(f"  Score: {node.score:.4f}")
    print(f"  Content: {node.text[:200]}...")
    print("-"*80)

print("\n⚠️ Potential Limitations:")
print("   - Single retrieval pass may miss information about one of the algorithms")
print("   - Difficult to ensure balanced coverage of both algorithms")
print("   - May not retrieve comparison-specific information")

## 7. Implement Sub-Question Query Engine

Now let's create a query engine that automatically decomposes complex queries into sub-questions.

In [None]:
# Create query engine tool for the knowledge base
query_engine_tool = QueryEngineTool(
    query_engine=index.as_query_engine(similarity_top_k=3),
    metadata=ToolMetadata(
        name="ml_algorithms",
        description="Knowledge base containing detailed information about machine learning algorithms including gradient boosting, random forests, neural networks, support vector machines, and k-means clustering. Use this tool to answer questions about specific ML algorithms, their properties, strengths, weaknesses, and applications.",
    ),
)

# Create sub-question query engine
subquestion_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[query_engine_tool],
    llm=azure_llm,
    verbose=True,  # Enable verbose mode to see sub-questions
)

print("✓ Sub-Question Query Engine created")
print("  Enhancement: Complex query → LLM decomposes → Multiple sub-queries → Aggregate → Synthesize")

## 8. Test Sub-Question Decomposition

Let's run the same complex query through the sub-question engine and observe the decomposition.

In [None]:
print("\n🎯 SUB-QUESTION QUERY ENGINE")
print("="*80)
print(f"\nComplex Query: {complex_query}")
print("\n" + "-"*80)
print("Generating sub-questions and retrieving...\n")

# Query with sub-question decomposition
# The verbose output will show the sub-questions automatically
subquestion_response = subquestion_engine.query(complex_query)

print("\n" + "="*80)
print("FINAL SYNTHESIZED ANSWER")
print("="*80)
print(f"\n{subquestion_response.response}")
print("\n" + "="*80)

## 9. Analyze Sub-Questions

Let's extract and display the sub-questions that were generated.

In [None]:
print("\n📋 GENERATED SUB-QUESTIONS")
print("="*80)

# Extract sub-questions from metadata
if hasattr(subquestion_response, 'metadata') and subquestion_response.metadata:
    sub_questions = subquestion_response.metadata.get('sub_questions', [])
    
    if sub_questions:
        for i, sq in enumerate(sub_questions, 1):
            print(f"\nSub-Question {i}:")
            print(f"  Query: {sq.get('query', 'N/A')}")
            print(f"  Tool: {sq.get('tool_name', 'N/A')}")
    else:
        print("  Note: Sub-questions processed but not stored in metadata")
        print("  (They were displayed in verbose output above)")
else:
    print("  Note: Sub-questions were generated and processed")
    print("  (They were displayed in verbose output above)")

print("\n" + "="*80)
print("\n💡 Key Insight:")
print("   The LLM automatically decomposed the complex comparison query into")
print("   focused sub-questions, each targeting specific aspects:")
print("   - Strengths of gradient boosting")
print("   - Weaknesses of gradient boosting")
print("   - Strengths of random forests")
print("   - Weaknesses of random forests")
print("   - Overfitting characteristics of each")
print("\n   Each sub-question retrieved independently, ensuring comprehensive coverage.")

## 10. Side-by-Side Comparison

In [None]:
print("\n📊 COMPARISON: BASELINE vs SUB-QUESTION DECOMPOSITION")
print("="*80)

print("\nBASELINE APPROACH:")
print("-"*80)
print("  Retrieval Strategy: Single query, top-3 chunks")
print(f"  Chunks Retrieved: {len(baseline_response.source_nodes)}")
print(f"  Answer Length: {len(baseline_response.response)} characters")
print(f"  Unique Sources: {len(set([Path(n.metadata.get('file_path', '')).name for n in baseline_response.source_nodes]))}")

print("\nSUB-QUESTION APPROACH:")
print("-"*80)
print("  Retrieval Strategy: Multiple sub-queries, independent retrieval")
print(f"  Chunks Retrieved: {len(subquestion_response.source_nodes) if hasattr(subquestion_response, 'source_nodes') else 'Multiple per sub-question'}")
print(f"  Answer Length: {len(subquestion_response.response)} characters")
if hasattr(subquestion_response, 'source_nodes'):
    print(f"  Unique Sources: {len(set([Path(n.metadata.get('file_path', '')).name for n in subquestion_response.source_nodes]))}")

print("\n" + "="*80)
print("\n🎯 EXPECTED IMPROVEMENTS:")
print("   ✓ More comprehensive coverage of both algorithms")
print("   ✓ Better balanced information retrieval")
print("   ✓ Explicit comparison structure")
print("   ✓ Less chance of missing key information")

## 11. Additional Test Cases

Let's test with more complex queries to demonstrate the robustness of sub-question decomposition.

In [None]:
# Additional complex queries
test_queries = [
    "What are the key differences between supervised and unsupervised learning? Provide examples of algorithms for each.",
    "How do neural networks differ from support vector machines in terms of training process, interpretability, and performance on high-dimensional data?",
    "Explain the bias-variance tradeoff and how it relates to ensemble methods like random forests and gradient boosting.",
]

print("\n🔬 ADDITIONAL TEST CASES")
print("="*80)

for i, query in enumerate(test_queries, 1):
    print(f"\n{'='*80}")
    print(f"Test Case {i}")
    print("="*80)
    print(f"Query: {query}")
    print("\n" + "-"*80)
    print("Processing with Sub-Question Decomposition...\n")
    
    response = subquestion_engine.query(query)
    
    print("\n" + "-"*80)
    print("Answer:")
    print("-"*80)
    print(response.response)
    print("\n")

## 12. Data Flow Visualization

In [None]:
print("\n📈 SUB-QUESTION DECOMPOSITION DATA FLOW")
print("="*80)
print("""
BASELINE SINGLE-QUERY PIPELINE:
┌───────────────────────────────┐
│  Complex Query                │
│  "Compare A and B"            │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  Single Retrieval Pass        │
│  (may favor one aspect)       │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  Top-K Chunks                 │
│  (potentially unbalanced)     │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  LLM Generates Answer         │
│  (from limited context)       │
└───────────────────────────────┘

SUB-QUESTION DECOMPOSITION PIPELINE:
┌───────────────────────────────┐
│  Complex Query                │
│  "Compare A and B"            │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  LLM Decomposes Query         │ ← KEY INNOVATION
│  Sub-Q1: "Strengths of A"     │
│  Sub-Q2: "Weaknesses of A"    │
│  Sub-Q3: "Strengths of B"     │
│  Sub-Q4: "Weaknesses of B"    │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  Parallel Retrieval           │
│  ┌─────────┐  ┌─────────┐    │
│  │ Sub-Q1  │  │ Sub-Q2  │    │
│  │Retrieve │  │Retrieve │    │
│  └─────────┘  └─────────┘    │
│  ┌─────────┐  ┌─────────┐    │
│  │ Sub-Q3  │  │ Sub-Q4  │    │
│  │Retrieve │  │Retrieve │    │
│  └─────────┘  └─────────┘    │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  Aggregate All Contexts       │
│  (balanced, comprehensive)    │
└──────────┬────────────────────┘
           │
           ▼
┌───────────────────────────────┐
│  LLM Synthesizes Answer       │
│  (from rich, diverse context) │
└───────────────────────────────┘
""")
print("="*80)

## 13. Performance Analysis

In [None]:
import time

print("\n⚡ PERFORMANCE ANALYSIS")
print("="*80)

# Measure baseline query time
start_time = time.time()
_ = baseline_engine.query("What is gradient boosting?")
baseline_time = time.time() - start_time

# Measure sub-question query time
start_time = time.time()
_ = subquestion_engine.query("What is gradient boosting?")
subquestion_time = time.time() - start_time

print(f"\nBaseline Query Time: {baseline_time:.2f} seconds")
print(f"Sub-Question Query Time: {subquestion_time:.2f} seconds")
print(f"Overhead: {subquestion_time - baseline_time:.2f} seconds ({((subquestion_time/baseline_time - 1) * 100):.1f}% increase)")

print("\n" + "-"*80)
print("\n💡 Trade-offs:")
print("   Cost: Additional LLM calls for decomposition and synthesis")
print("   Benefit: Significantly better coverage for complex queries")
print("   \nRecommendation: Use sub-question decomposition for:")
print("     - Comparison queries (\"A vs B\")")
print("     - Multi-aspect questions")
print("     - Questions requiring information synthesis")
print("     - Research and analysis tasks")

## 14. Key Takeaways and Insights

### When to Use Sub-Question Decomposition:
- **Comparison queries**: "Compare X and Y"
- **Multi-aspect questions**: Questions with multiple parts
- **Complex reasoning**: Requiring synthesis from multiple sources
- **Research tasks**: Comprehensive understanding needed

### Benefits:
1. **Better Coverage**: Ensures all aspects of a complex query are addressed
2. **Balanced Retrieval**: Each sub-question gets dedicated retrieval
3. **Interpretability**: Sub-questions make the reasoning process transparent
4. **Comprehensive Answers**: Synthesis of multiple retrieval passes

### Trade-offs:
- **Pros:**
  - Dramatically better for multi-faceted queries
  - More comprehensive and balanced information
  - Transparent reasoning process
  - Scales well to very complex queries
  
- **Cons:**
  - Higher latency (multiple retrieval passes)
  - Increased cost (more LLM and embedding calls)
  - May be overkill for simple queries
  - Requires well-defined knowledge base descriptions

### Best Practices:
1. **Tool Metadata**: Provide detailed, accurate descriptions for query engine tools
2. **Query Design**: Works best with questions that naturally decompose
3. **Context Size**: May need to increase context window for synthesis
4. **Selective Use**: Reserve for queries that truly need decomposition

### Real-World Applications:
- Research assistants (academic papers, technical docs)
- Competitive analysis
- Product comparison systems
- Multi-domain question answering
- Technical support systems requiring comprehensive answers

## 15. Summary

In this demo, we've successfully implemented and compared:

1. **Baseline RAG**: Single query → Single retrieval → Generate answer
2. **Sub-Question Decomposition**: Complex query → Decompose into sub-queries → Multiple parallel retrievals → Aggregate → Synthesize

The key insight is that complex queries often require information from multiple sources or perspectives. By automatically decomposing these queries into simpler sub-questions, we ensure comprehensive coverage and better synthesis.

This technique is particularly powerful for:
- Comparison and contrast questions
- Multi-hop reasoning tasks
- Questions requiring balanced coverage of multiple topics
- Research and analysis use cases

### Next Steps:
- Combine with HyDE for even better sub-question retrieval
- Experiment with multiple knowledge bases (coming in Demo #8: Agentic RAG)
- Try custom sub-question generation strategies
- Measure improvement with evaluation metrics (Demo #10)

---

**References:**
- MultiHop-RAG Paper: https://hf.co/papers/2401.15391
- BeamAggR Paper: https://hf.co/papers/2406.19820
- Microsoft Learn: Build Advanced RAG Systems