# Demo #2: Multi-Query Decomposition - Complex Query Handling

## Overview

This demo demonstrates how **Sub-Query Decomposition** enables comprehensive answers to complex, multi-faceted questions by breaking them into simpler sub-queries and aggregating the results.

### Core Concepts:
- **Sub-Query Decomposition**: Breaking complex queries into manageable pieces
- **Parallel Retrieval**: Executing multiple searches independently
- **Context Aggregation**: Synthesizing information from multiple sources
- **Multi-Hop Reasoning**: Answering questions requiring multiple pieces of information

### Why Sub-Query Decomposition Works:
Complex questions often require information from multiple sources or perspectives:
- "Compare X and Y" requires retrieving information about both X and Y separately
- "What are the advantages and disadvantages..." needs retrieval focused on pros and cons
- Single-query approaches may retrieve incomplete or biased information

Sub-Query Decomposition solves this by:
1. Using an LLM to break the complex query into focused sub-questions
2. Retrieving relevant context for each sub-question independently
3. Aggregating all retrieved information
4. Synthesizing a comprehensive answer from the complete context

### Demo Structure:
1. Setup and data ingestion
2. Test baseline single-query approach
3. Implement sub-query decomposition
4. Compare results and coverage
5. Analyze multi-hop reasoning capabilities

## 1. Environment Setup and Dependencies

In [None]:
# Install required packages
# Run this cell only once
# !pip install llama-index llama-index-llms-azure-openai llama-index-embeddings-azure-openai python-dotenv

In [None]:
# Import required libraries
import os
from dotenv import load_dotenv
from pathlib import Path

# LlamaIndex core components
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    Settings
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# Azure OpenAI components
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

print("✓ All imports successful")

## 2. Configure Azure OpenAI Connection

In [None]:
# Load environment variables
load_dotenv()

# Azure OpenAI configuration
api_key = os.getenv("AZURE_OPENAI_API_KEY")
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2024-02-15-preview")
llm_deployment = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")
embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT")

# Validate configuration
if not all([api_key, azure_endpoint, llm_deployment, embedding_deployment]):
    raise ValueError("Missing required Azure OpenAI configuration. Check your .env file.")

print("✓ Azure OpenAI configuration loaded")
print(f"  Endpoint: {azure_endpoint}")
print(f"  LLM Deployment: {llm_deployment}")
print(f"  Embedding Deployment: {embedding_deployment}")

In [None]:
# Initialize Azure OpenAI LLM
azure_llm = AzureOpenAI(
    model="gpt-4",
    deployment_name=llm_deployment,
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
    temperature=0.1,
)

# Initialize Azure OpenAI Embedding Model
azure_embed = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name=embedding_deployment,
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

# Set global defaults
Settings.llm = azure_llm
Settings.embed_model = azure_embed
Settings.chunk_size = 512
Settings.chunk_overlap = 50

print("✓ Azure OpenAI models initialized")

## 3. Load and Process Documents

We'll use the same ML concepts knowledge base from Demo #1.

In [None]:
# Define data directory
data_dir = Path("./data/ml_concepts")

# Load documents
documents = SimpleDirectoryReader(
    input_dir=str(data_dir),
    required_exts=['.md']
).load_data()

print(f"✓ Loaded {len(documents)} documents")
for doc in documents:
    filename = Path(doc.metadata.get('file_name', 'unknown')).stem
    print(f"  - {filename} ({len(doc.text)} characters)")

In [None]:
# Create text splitter and parse documents
text_splitter = SentenceSplitter(
    chunk_size=512,
    chunk_overlap=50
)

nodes = text_splitter.get_nodes_from_documents(documents)

print(f"✓ Created {len(nodes)} text chunks")

In [None]:
# Create vector index
index = VectorStoreIndex(
    nodes=nodes,
    embed_model=azure_embed
)

print("✓ Vector index created")

## 4. Baseline: Single-Query Approach

First, let's test a complex query that requires comparing multiple algorithms using a standard query engine.

In [None]:
# Create baseline query engine
baseline_engine = index.as_query_engine(
    llm=azure_llm,
    similarity_top_k=3  # Retrieve top 3 chunks
)

print("✓ Baseline query engine created")

In [None]:
# Define a complex, multi-faceted query
complex_query = "Compare the strengths and weaknesses of gradient boosting and random forests for classification tasks. When should I use each one?"

print(f"Complex Query: {complex_query}")
print("=" * 100)

In [None]:
# Execute baseline query
baseline_response = baseline_engine.query(complex_query)

print("\n📊 BASELINE APPROACH (Single Query)")
print("=" * 100)
print(f"\nAnswer:\n{baseline_response.response}")
print("\n" + "=" * 100)
print("\nRetrieved Chunks:")
for i, node in enumerate(baseline_response.source_nodes, 1):
    print(f"\n[Chunk {i}] Score: {node.score:.4f} | Source: {Path(node.metadata.get('file_name', 'unknown')).stem}")
    print(f"{node.text[:250]}...")
    print("-" * 100)

## 5. Implement Sub-Query Decomposition

Now let's use the `SubQuestionQueryEngine` to automatically decompose the complex query into sub-questions.

In [None]:
# Create a query engine tool
# This wraps our index with metadata describing what it contains
query_engine_tool = QueryEngineTool(
    query_engine=index.as_query_engine(similarity_top_k=3),
    metadata=ToolMetadata(
        name="ml_algorithms",
        description="Comprehensive knowledge base about machine learning algorithms including gradient boosting, random forests, neural networks, support vector machines, and k-means clustering. Contains information about how they work, their advantages, disadvantages, and use cases."
    )
)

print("✓ Query engine tool created")

In [None]:
# Create SubQuestionQueryEngine
subquestion_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[query_engine_tool],
    llm=azure_llm,
    verbose=True,  # This will show us the sub-questions being generated
    use_async=False
)

print("✓ Sub-question query engine created")

## 6. Execute Sub-Query Decomposition

Watch how the engine automatically breaks down our complex query!

In [None]:
# Execute the same complex query with sub-question decomposition
print("\n🚀 SUB-QUESTION DECOMPOSITION APPROACH")
print("=" * 100)
print(f"\nOriginal Complex Query: {complex_query}")
print("\n" + "=" * 100)
print("\nGenerating and executing sub-questions...\n")

subquestion_response = subquestion_engine.query(complex_query)

print("\n" + "=" * 100)
print("\n✨ SYNTHESIZED ANSWER:")
print("=" * 100)
print(f"\n{subquestion_response.response}")

## 7. Analyze Sub-Questions Generated

Let's examine what sub-questions were automatically created.

In [None]:
# Extract and display sub-questions from the response metadata
print("\n📋 GENERATED SUB-QUESTIONS ANALYSIS")
print("=" * 100)

if hasattr(subquestion_response, 'metadata') and 'sub_qa' in subquestion_response.metadata:
    sub_qa_pairs = subquestion_response.metadata['sub_qa']
    
    for i, (sub_q, sub_a) in enumerate(sub_qa_pairs, 1):
        print(f"\n[Sub-Question {i}]")
        print(f"Question: {sub_q.sub_q.sub_question}")
        print(f"Tool Used: {sub_q.sub_q.tool_name}")
        print(f"\nAnswer: {sub_a.response[:300]}...")
        print("-" * 100)
else:
    print("Sub-question metadata not available in this response format.")
    print("However, you should have seen the sub-questions in the verbose output above.")

## 8. Side-by-Side Comparison

Let's compare the coverage and quality of both approaches.

In [None]:
def compare_responses(baseline_resp, subq_resp, query):
    print("\n" + "=" * 100)
    print("📊 DETAILED COMPARISON: BASELINE vs. SUB-QUESTION DECOMPOSITION")
    print("=" * 100)
    
    print(f"\nOriginal Query: {query}")
    print("\n" + "-" * 100)
    
    # Compare answer lengths
    print("\n1. ANSWER LENGTH COMPARISON")
    print("-" * 100)
    print(f"Baseline answer: {len(baseline_resp.response)} characters")
    print(f"Sub-question answer: {len(subq_resp.response)} characters")
    
    # Compare source coverage
    print("\n2. SOURCE COVERAGE COMPARISON")
    print("-" * 100)
    
    baseline_sources = set([Path(n.metadata.get('file_name', 'unknown')).stem for n in baseline_resp.source_nodes])
    print(f"\nBaseline sources ({len(baseline_sources)}): {baseline_sources}")
    print(f"Baseline retrieved {len(baseline_resp.source_nodes)} chunks total")
    
    # For sub-question engine, sources are distributed across sub-queries
    subq_sources = set([Path(n.metadata.get('file_name', 'unknown')).stem for n in subq_resp.source_nodes])
    print(f"\nSub-question sources ({len(subq_sources)}): {subq_sources}")
    print(f"Sub-question retrieved {len(subq_resp.source_nodes)} chunks total")
    
    # Identify unique sources
    only_baseline = baseline_sources - subq_sources
    only_subq = subq_sources - baseline_sources
    shared = baseline_sources & subq_sources
    
    print(f"\nShared sources: {shared}")
    if only_baseline:
        print(f"Only in baseline: {only_baseline}")
    if only_subq:
        print(f"Only in sub-question: {only_subq}")
    
    # Compare answers
    print("\n3. GENERATED ANSWERS")
    print("-" * 100)
    print(f"\n📊 BASELINE ANSWER:\n{baseline_resp.response}")
    print(f"\n{'-' * 100}")
    print(f"\n🚀 SUB-QUESTION ANSWER:\n{subq_resp.response}")
    
    print("\n" + "=" * 100)

# Run comparison
compare_responses(baseline_response, subquestion_response, complex_query)

## 9. Test with Additional Complex Queries

Let's test with more queries that require multi-hop reasoning.

In [None]:
# Additional complex queries
additional_queries = [
    "Which algorithms are better for high-dimensional data, and what are their computational trade-offs?",
    "How do supervised learning algorithms differ in their approaches to finding decision boundaries, and which is most interpretable?",
    "What are the key differences between algorithms that handle outliers well versus those that don't?"
]

print("\n" + "=" * 100)
print("🔬 ADDITIONAL COMPLEX QUERY TESTS")
print("=" * 100)

for i, query in enumerate(additional_queries[:1], 1):  # Test just one to save time
    print(f"\n\n{'=' * 100}")
    print(f"Test Query {i}: {query}")
    print("=" * 100)
    
    # Baseline
    print("\n📊 Baseline Approach:")
    baseline_resp = baseline_engine.query(query)
    print(f"Sources: {[Path(n.metadata.get('file_name', 'unknown')).stem for n in baseline_resp.source_nodes]}")
    print(f"Answer length: {len(baseline_resp.response)} characters")
    
    # Sub-question
    print(f"\n🚀 Sub-Question Approach:")
    print("(Watch for sub-questions being generated...)\n")
    subq_resp = subquestion_engine.query(query)
    print(f"\nSources: {[Path(n.metadata.get('file_name', 'unknown')).stem for n in subq_resp.source_nodes]}")
    print(f"Answer length: {len(subq_resp.response)} characters")
    
    print(f"\n✨ Sub-Question Answer:\n{subq_resp.response}")

## 10. Visualize the Data Flow

Let's create a simple visualization of how sub-query decomposition works.

In [None]:
print("""
╔════════════════════════════════════════════════════════════════════════════════╗
║                    DATA FLOW: SUB-QUERY DECOMPOSITION                          ║
╚════════════════════════════════════════════════════════════════════════════════╝

BASELINE APPROACH (Single Query):
─────────────────────────────────

  Complex Query
       │
       ▼
  Embed Query
       │
       ▼
  Vector Search ────► Retrieve Top-K Chunks (may miss important context)
       │
       ▼
  LLM Generation
       │
       ▼
  Final Answer (limited by single retrieval pass)


SUB-QUESTION DECOMPOSITION APPROACH:
────────────────────────────────────

  Complex Query
       │
       ▼
  LLM Decomposes ────► Sub-Q1: "What are strengths of gradient boosting?"
       │              Sub-Q2: "What are weaknesses of gradient boosting?"
       │              Sub-Q3: "What are strengths of random forests?"
       │              Sub-Q4: "What are weaknesses of random forests?"
       │              Sub-Q5: "When to use each algorithm?"
       ▼
  Parallel Retrieval:
       ├─────► Sub-Q1 → Vector Search → Chunks about GB strengths
       ├─────► Sub-Q2 → Vector Search → Chunks about GB weaknesses
       ├─────► Sub-Q3 → Vector Search → Chunks about RF strengths
       ├─────► Sub-Q4 → Vector Search → Chunks about RF weaknesses
       └─────► Sub-Q5 → Vector Search → Chunks about use cases
       │
       ▼
  Aggregate All Retrieved Contexts (comprehensive coverage)
       │
       ▼
  LLM Synthesis
       │
       ▼
  Final Answer (comprehensive, balanced, multi-faceted)


KEY ADVANTAGES:
───────────────
✓ Each sub-question targets specific information
✓ Multiple retrieval passes increase coverage
✓ Balanced information from different sources
✓ Better handling of comparative queries
✓ Reduced chance of missing critical context

TRADE-OFFS:
───────────
• Higher latency (multiple LLM calls + multiple retrievals)
• Increased token usage and cost
• More complex pipeline to debug
""")

## 11. Key Takeaways and Best Practices

### What We Learned:

1. **Complex queries benefit from decomposition**: Questions requiring comparisons, multi-faceted analysis, or information synthesis are better served by breaking them into sub-questions.

2. **Increased coverage**: Sub-query decomposition retrieves from multiple focused searches, resulting in more comprehensive context.

3. **Better balance**: For comparative queries ("X vs Y"), decomposition ensures both sides get equal attention in retrieval.

4. **Automatic planning**: The LLM automatically determines what sub-questions are needed - no manual query engineering required.

### When to Use Sub-Query Decomposition:

✅ **Use when:**
- Query contains "compare", "contrast", "differences"
- Question has multiple parts or facets
- Requires synthesizing information from different sources
- Multi-hop reasoning is needed
- Comprehensive coverage is more important than speed

❌ **Avoid when:**
- Simple, focused queries
- Real-time latency requirements
- Limited API budget
- Single-source answers are sufficient

### Best Practices:

1. **Tool metadata matters**: Provide clear, descriptive metadata for query engine tools so the LLM can generate relevant sub-questions.

2. **Use verbose mode during development**: See what sub-questions are being generated to debug and optimize.

3. **Consider async execution**: Set `use_async=True` to parallelize sub-question retrieval (reduces latency).

4. **Balance top-k per sub-query**: Lower top-k per sub-question (e.g., 2-3) since you're doing multiple retrievals.

5. **Monitor costs**: Each sub-question is an additional LLM call and retrieval operation.

### Next Steps:

In Demo #3, we'll explore **Hybrid Search**, combining dense vector search with sparse keyword retrieval (BM25) to improve precision across different query types.