# Part 4: Complex Query Handling (Query Decomposition)

## Learning Objectives

By the end of this notebook, you will:
1. Understand when and why to decompose complex queries
2. Generate sub-questions from complex security questions
3. Implement sequential answering with context accumulation
4. Implement parallel answering with answer synthesis
5. Compare sequential vs parallel approaches
6. Handle multi-step reasoning for security scenarios

## The Problem with Complex Queries

Some security questions are too complex to answer in one retrieval:

### Example: "How do I secure my entire ML pipeline?"

This question spans multiple topics:
- **Data security**: Training data protection, poisoning prevention
- **Training security**: Secure training infrastructure, access controls
- **Model security**: Model theft prevention, integrity verification
- **Deployment security**: API security, rate limiting, monitoring
- **Inference security**: Input validation, output filtering
- **Monitoring**: Logging, anomaly detection, incident response

**Single retrieval limitations:**
- Retrieved documents may only cover 1-2 aspects
- LLM context window gets overwhelmed
- Answer becomes superficial or incomplete

### Solution: Query Decomposition

Break the complex question into focused sub-questions:
1. "What are security risks in ML training data?"
2. "How to secure ML model storage and access?"
3. "What are inference-time security considerations?"
4. "How to monitor ML systems for security threats?"

Then answer each sub-question and synthesize results.

## Two Approaches

### 1. Sequential Decomposition
```
Question → Sub-Q1 → Answer1 → Sub-Q2 → Answer2 → ... → Final Answer
                      ↓          ↓
                    Context accumulates iteratively
```
- Answer each sub-question in order
- Pass previous Q&A pairs as context to next question
- Build comprehensive answer iteratively
- **Best for**: Dependent questions where later questions need earlier context

### 2. Parallel Decomposition
```
Question → Sub-Q1 → Answer1 ↘
        → Sub-Q2 → Answer2 → Synthesis → Final Answer
        → Sub-Q3 → Answer3 ↗
        (all concurrent)
```
- Answer all sub-questions independently
- Faster (parallel processing)
- Synthesize all answers into final response
- **Best for**: Independent questions that can be answered separately

---
## 1. Environment Setup

In [None]:
# Import required libraries
import os
from dotenv import load_dotenv
from typing import List, Dict
import asyncio

# LangChain imports
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.schema import Document

# Load environment variables
load_dotenv()

if not os.getenv("OPENAI_API_KEY"):
    print("⚠️  WARNING: OPENAI_API_KEY not found")
else:
    print("✅ OpenAI API key loaded")

In [None]:
# Initialize embeddings and LLM
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

llm = ChatOpenAI(
    model="gpt-4",
    temperature=0,
    openai_api_key=os.getenv("OPENAI_API_KEY")
)

print("✅ Embeddings and LLM initialized")

In [None]:
# Load vector store
vectorstore = Chroma(
    collection_name="owasp_llm_top10",
    embedding_function=embeddings,
    persist_directory="../data/chroma_db"
)

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 3}
)

print("✅ Vector store loaded")
print(f"   Collection: {vectorstore._collection.count()} documents")

---
## 2. Query Decomposition

First, we need to break down complex questions into manageable sub-questions.

In [None]:
# Prompt template for decomposing questions
decomposition_template = """You are an AI security expert assistant.
Your task is to break down a complex security question into 3-5 focused sub-questions that together fully address the original question.

Guidelines:
1. Each sub-question should be specific and answerable independently
2. Sub-questions should cover different aspects of the main question
3. Order sub-questions logically (e.g., from broad to specific, or chronologically)
4. Focus on actionable, practical security topics
5. Ensure sub-questions are relevant to LLM/ML security when applicable

Original question: {question}

Provide 3-5 sub-questions, one per line, numbered:"""

decomposition_prompt = ChatPromptTemplate.from_template(decomposition_template)

# Create decomposition chain
decompose_chain = (
    decomposition_prompt
    | llm
    | StrOutputParser()
    | (lambda x: [line.strip() for line in x.split('\n') if line.strip() and any(c.isdigit() for c in line[:3])])
)

print("✅ Query decomposition chain created")

In [None]:
# Test decomposition
test_question = "How do I secure my ML deployment pipeline?"

print("=" * 80)
print(f"❓ Complex Question: {test_question}")
print("=" * 80)

sub_questions = decompose_chain.invoke({"question": test_question})

print("\n📋 Sub-Questions Generated:\n")
for i, sq in enumerate(sub_questions, 1):
    # Remove number prefix if present
    clean_sq = sq.lstrip('0123456789.').strip()
    print(f"{i}. {clean_sq}")

---
## 3. Sequential Decomposition

Answer sub-questions one at a time, passing previous Q&A pairs as context.

### Benefits:
- **Context accumulation**: Later answers can reference earlier ones
- **Coherent narrative**: Builds a logical flow
- **Dependency handling**: Later questions can depend on earlier answers

### Drawbacks:
- **Sequential latency**: Must wait for each answer before proceeding
- **Error propagation**: Mistakes in early answers affect later ones
- **Token usage**: Context grows with each iteration

In [None]:
# Prompt for answering sub-questions (sequential)
sequential_answer_template = """You are an AI security expert assistant.

You are answering a series of sub-questions to address a complex security question.

Original question: {original_question}

Previous Q&A pairs (if any):
{previous_qa}

Current sub-question: {sub_question}

Retrieved context for this sub-question:
{context}

Instructions:
1. Answer the current sub-question using the retrieved context
2. Reference previous answers if relevant
3. Be specific and cite security best practices
4. Keep your answer focused (2-3 paragraphs)
5. Mention relevant OWASP vulnerabilities if applicable

Answer:"""

sequential_answer_prompt = ChatPromptTemplate.from_template(sequential_answer_template)

print("✅ Sequential answering prompt created")

In [None]:
def format_docs(docs: List[Document]) -> str:
    """Format documents for context."""
    return "\n\n".join([f"Document {i+1} ({doc.metadata['id']} - {doc.metadata['title']}):\n{doc.page_content}" 
                        for i, doc in enumerate(docs)])

def format_previous_qa(qa_pairs: List[tuple]) -> str:
    """Format previous Q&A pairs."""
    if not qa_pairs:
        return "(No previous Q&A pairs yet)"
    
    formatted = []
    for i, (q, a) in enumerate(qa_pairs, 1):
        formatted.append(f"Q{i}: {q}\nA{i}: {a}")
    return "\n\n".join(formatted)

print("✅ Formatting helpers created")

In [None]:
def sequential_decomposition(question: str, retriever, llm) -> str:
    """
    Answer a complex question using sequential decomposition.
    
    Args:
        question: Complex question to answer
        retriever: Vector store retriever
        llm: Language model
        
    Returns:
        Final synthesized answer
    """
    print(f"\n🔄 Sequential Decomposition: Processing...\n")
    
    # Step 1: Decompose question
    print("1️⃣  Decomposing question...")
    sub_questions = decompose_chain.invoke({"question": question})
    sub_questions = [sq.lstrip('0123456789.').strip() for sq in sub_questions]
    print(f"   Generated {len(sub_questions)} sub-questions\n")
    
    # Step 2: Answer sub-questions sequentially
    qa_pairs = []
    
    for i, sub_q in enumerate(sub_questions, 1):
        print(f"2️⃣  Answering sub-question {i}/{len(sub_questions)}...")
        print(f"   Q: {sub_q}")
        
        # Retrieve context for this sub-question
        docs = retriever.get_relevant_documents(sub_q)
        context = format_docs(docs)
        
        # Format previous Q&A pairs
        previous_qa = format_previous_qa(qa_pairs)
        
        # Generate answer
        prompt_value = sequential_answer_prompt.invoke({
            "original_question": question,
            "previous_qa": previous_qa,
            "sub_question": sub_q,
            "context": context
        })
        
        response = llm.invoke(prompt_value)
        answer = response.content
        
        # Store Q&A pair
        qa_pairs.append((sub_q, answer))
        print(f"   ✓ Answered (context accumulated)\n")
    
    # Step 3: Synthesize final answer
    print("3️⃣  Synthesizing final answer...\n")
    
    synthesis_prompt = ChatPromptTemplate.from_template(
        """You are synthesizing a comprehensive answer to a complex security question.

Original question: {question}

Sub-questions and answers:
{qa_pairs}

Instructions:
1. Create a cohesive, comprehensive answer to the original question
2. Integrate insights from all sub-answers
3. Maintain a logical flow and structure
4. Highlight key security recommendations
5. Keep the answer comprehensive but concise (5-7 paragraphs)

Comprehensive Answer:"""
    )
    
    qa_text = format_previous_qa(qa_pairs)
    final_prompt = synthesis_prompt.invoke({"question": question, "qa_pairs": qa_text})
    final_response = llm.invoke(final_prompt)
    
    print("✅ Sequential decomposition complete!\n")
    return final_response.content

print("✅ Sequential decomposition function created")

In [None]:
# Test sequential decomposition
print("=" * 80)
print("🧪 Testing Sequential Decomposition")
print("=" * 80)

question = "How do I secure my ML deployment pipeline?"
print(f"\n❓ Question: {question}\n")

answer = sequential_decomposition(question, retriever, llm)

print("\n" + "=" * 80)
print("📄 FINAL ANSWER (Sequential Approach)")
print("=" * 80)
print(answer)
print("\n" + "=" * 80)

---
## 4. Parallel Decomposition

Answer all sub-questions independently, then synthesize.

### Benefits:
- **Faster**: All sub-questions answered concurrently
- **Independent**: No error propagation between sub-answers
- **Simpler**: Each sub-question answered in isolation

### Drawbacks:
- **No context sharing**: Sub-answers can't reference each other
- **Synthesis complexity**: Must intelligently combine independent answers
- **Potential redundancy**: Sub-answers may overlap without coordination

In [None]:
# Prompt for answering sub-questions (parallel)
parallel_answer_template = """You are an AI security expert assistant.

You are answering ONE sub-question that is part of a larger complex question.

Original question: {original_question}

Sub-question to answer: {sub_question}

Retrieved context:
{context}

Instructions:
1. Answer this specific sub-question using the retrieved context
2. Be comprehensive but focused on this aspect
3. Cite security best practices and specific recommendations
4. Mention relevant OWASP vulnerabilities if applicable
5. Keep your answer self-contained (2-3 paragraphs)

Answer:"""

parallel_answer_prompt = ChatPromptTemplate.from_template(parallel_answer_template)

print("✅ Parallel answering prompt created")

In [None]:
def answer_sub_question(sub_question: str, original_question: str, retriever, llm) -> tuple:
    """
    Answer a single sub-question.
    
    Args:
        sub_question: The sub-question to answer
        original_question: The original complex question
        retriever: Vector store retriever
        llm: Language model
        
    Returns:
        Tuple of (sub_question, answer)
    """
    # Retrieve context
    docs = retriever.get_relevant_documents(sub_question)
    context = format_docs(docs)
    
    # Generate answer
    prompt_value = parallel_answer_prompt.invoke({
        "original_question": original_question,
        "sub_question": sub_question,
        "context": context
    })
    
    response = llm.invoke(prompt_value)
    return (sub_question, response.content)

print("✅ Sub-question answering function created")

In [None]:
def parallel_decomposition(question: str, retriever, llm) -> str:
    """
    Answer a complex question using parallel decomposition.
    
    Args:
        question: Complex question to answer
        retriever: Vector store retriever
        llm: Language model
        
    Returns:
        Final synthesized answer
    """
    print(f"\n🔄 Parallel Decomposition: Processing...\n")
    
    # Step 1: Decompose question
    print("1️⃣  Decomposing question...")
    sub_questions = decompose_chain.invoke({"question": question})
    sub_questions = [sq.lstrip('0123456789.').strip() for sq in sub_questions]
    print(f"   Generated {len(sub_questions)} sub-questions\n")
    
    # Step 2: Answer all sub-questions in parallel
    print(f"2️⃣  Answering {len(sub_questions)} sub-questions in parallel...")
    
    # For demonstration, we'll use sequential execution
    # In production, you could use asyncio for true parallelism
    qa_pairs = []
    for i, sub_q in enumerate(sub_questions, 1):
        print(f"   Answering {i}/{len(sub_questions)}: {sub_q[:50]}...")
        qa_pair = answer_sub_question(sub_q, question, retriever, llm)
        qa_pairs.append(qa_pair)
    
    print(f"   ✓ All {len(qa_pairs)} sub-questions answered\n")
    
    # Step 3: Synthesize final answer
    print("3️⃣  Synthesizing final answer from all sub-answers...\n")
    
    synthesis_prompt = ChatPromptTemplate.from_template(
        """You are synthesizing a comprehensive answer to a complex security question.

Original question: {question}

Sub-questions and their independent answers:
{qa_pairs}

Instructions:
1. Create a cohesive, comprehensive answer to the original question
2. Integrate insights from all sub-answers into a logical flow
3. Remove redundancy while preserving key information
4. Organize by themes or stages (e.g., prevention, detection, response)
5. Highlight the most critical security recommendations
6. Keep the answer comprehensive but well-structured (5-7 paragraphs)

Comprehensive Answer:"""
    )
    
    qa_text = format_previous_qa(qa_pairs)
    final_prompt = synthesis_prompt.invoke({"question": question, "qa_pairs": qa_text})
    final_response = llm.invoke(final_prompt)
    
    print("✅ Parallel decomposition complete!\n")
    return final_response.content

print("✅ Parallel decomposition function created")

In [None]:
# Test parallel decomposition
print("=" * 80)
print("🧪 Testing Parallel Decomposition")
print("=" * 80)

question = "How do I secure my ML deployment pipeline?"
print(f"\n❓ Question: {question}\n")

answer = parallel_decomposition(question, retriever, llm)

print("\n" + "=" * 80)
print("📄 FINAL ANSWER (Parallel Approach)")
print("=" * 80)
print(answer)
print("\n" + "=" * 80)

---
## 5. Comprehensive Comparison

Let's compare both approaches on different types of complex security questions.

In [None]:
import time

def compare_decomposition_methods(question: str):
    """
    Compare sequential vs parallel decomposition.
    """
    print("\n" + "=" * 80)
    print(f"❓ COMPLEX QUESTION: {question}")
    print("=" * 80)
    
    # Sequential
    print("\n1️⃣  SEQUENTIAL DECOMPOSITION")
    print("-" * 80)
    start = time.time()
    seq_answer = sequential_decomposition(question, retriever, llm)
    seq_time = time.time() - start
    
    print(f"\n⏱️  Time: {seq_time:.2f}s")
    print(f"\n📄 Answer Preview:\n{seq_answer[:300]}...\n")
    
    # Parallel
    print("\n2️⃣  PARALLEL DECOMPOSITION")
    print("-" * 80)
    start = time.time()
    par_answer = parallel_decomposition(question, retriever, llm)
    par_time = time.time() - start
    
    print(f"\n⏱️  Time: {par_time:.2f}s (would be faster with true async)")
    print(f"\n📄 Answer Preview:\n{par_answer[:300]}...\n")
    
    # Comparison
    print("\n" + "=" * 80)
    print("📊 COMPARISON")
    print("=" * 80)
    print(f"Sequential Time: {seq_time:.2f}s")
    print(f"Parallel Time:   {par_time:.2f}s")
    print(f"\nSequential Answer Length: {len(seq_answer)} characters")
    print(f"Parallel Answer Length:   {len(par_answer)} characters")
    print("\n" + "=" * 80 + "\n")

print("✅ Comparison function created")

### Test Case 1: End-to-End ML Security

In [None]:
compare_decomposition_methods(
    "What are the comprehensive security measures for deploying an LLM-based application in production?"
)

### Test Case 2: Attack Surface Analysis

In [None]:
compare_decomposition_methods(
    "What are all the ways an attacker can compromise an ML system, and how do I defend against each?"
)

### Test Case 3: Incident Response Planning

In [None]:
compare_decomposition_methods(
    "If I suspect my LLM has been compromised, what steps should I take to investigate and respond?"
)

---
## 6. Analysis and Recommendations

### When to Use Sequential Decomposition

**Best for:**
- Questions with **dependent sub-topics** (later parts need earlier context)
- **Narrative-style answers** (story-like flow)
- **Step-by-step procedures** (each step builds on previous)
- **Smaller sub-question count** (3-4 sub-questions)

**Examples:**
- "Walk me through securing an ML pipeline from data collection to deployment"
- "What's the process for investigating a suspected model extraction attack?"
- "How do I implement defense-in-depth for my LLM application?"

**Pros:**
- ✅ Coherent narrative flow
- ✅ Sub-answers can reference each other
- ✅ Better for procedural knowledge
- ✅ More natural synthesis

**Cons:**
- ❌ Slower (sequential execution)
- ❌ Growing context (higher token costs)
- ❌ Error propagation risk

---

### When to Use Parallel Decomposition

**Best for:**
- Questions with **independent sub-topics** (can be answered separately)
- **Large sub-question count** (5+ sub-questions)
- **Time-sensitive scenarios** (need faster responses)
- **Comprehensive coverage** (breadth over narrative)

**Examples:**
- "What are all the OWASP Top 10 LLM vulnerabilities and their mitigations?"
- "Compare different authentication methods for ML APIs"
- "What security controls should I implement across my ML stack?"

**Pros:**
- ✅ Faster (parallel execution possible)
- ✅ No error propagation
- ✅ Simpler sub-question handling
- ✅ Better for comprehensive coverage

**Cons:**
- ❌ No context sharing between sub-answers
- ❌ Potential redundancy in answers
- ❌ Synthesis can be challenging

---

### Hybrid Approach

For production systems, consider a **hybrid strategy**:

1. **Classify query complexity** (simple, moderate, complex)
2. **Identify dependencies** (are sub-questions dependent?)
3. **Route appropriately**:
   - Simple → Basic RAG (no decomposition)
   - Moderate + Independent → Parallel decomposition
   - Complex + Dependent → Sequential decomposition
4. **Optimize for use case** (speed vs coherence)

### Cost Analysis

**Sequential Decomposition:**
- Decomposition: 1 LLM call
- Sub-answers: N LLM calls (with growing context)
- Synthesis: 1 LLM call
- **Total**: N + 2 LLM calls, higher tokens per call

**Parallel Decomposition:**
- Decomposition: 1 LLM call
- Sub-answers: N LLM calls (independent, smaller context)
- Synthesis: 1 LLM call
- **Total**: N + 2 LLM calls, lower tokens per call

**Conclusion**: Parallel is typically cheaper due to smaller context per call.

---
## 7. Summary and Key Takeaways

### What We Built

✅ Two query decomposition strategies:
1. **Sequential Decomposition**: Iterative answering with context accumulation
2. **Parallel Decomposition**: Independent answering with synthesis
3. **Comparison Framework**: Systematic evaluation of both approaches

### Core Concepts Learned

1. **Query Decomposition**: Breaking complex questions into sub-questions
2. **Context Management**: Passing Q&A pairs as context (sequential)
3. **Independent Processing**: Answering without context sharing (parallel)
4. **Answer Synthesis**: Combining sub-answers into cohesive response
5. **Trade-offs**: Coherence vs speed, dependencies vs independence

### Key Insights

**Decomposition Benefits:**
- ↑↑ Handles complex, multi-faceted questions
- ↑ More comprehensive answers
- ↑ Better coverage of all aspects
- ✅ Essential for enterprise security analysis

**Sequential vs Parallel:**
- **Sequential**: Better coherence, slower, dependent sub-questions
- **Parallel**: Faster, independent, better for comprehensive coverage
- **Hybrid**: Route based on question characteristics

### Production Recommendations

1. **Start with classification**: Determine if decomposition is needed
2. **Use parallel by default**: Faster and cheaper for most cases
3. **Use sequential for procedures**: When order and dependencies matter
4. **Implement async for parallel**: True parallelism for speed gains
5. **Cache sub-answers**: Reuse answers for similar questions

### Next Steps

In **Part 5**, we'll add **Metadata Filtering**:
- Convert natural language to structured queries
- Filter by severity, date, affected systems
- Use LLM function calling for metadata extraction
- Build production-ready filtered retrieval

Example: "Show me critical CVEs affecting PyTorch from the last 6 months" →
```python
{
  "severity": "Critical",
  "product": "PyTorch",
  "date_range": {"start": "2024-04-20", "end": "2024-10-20"}
}
```

---

### 🎯 Practice Exercises

1. **Implement True Async**: Use `asyncio` for parallel decomposition
2. **Add Validation**: Check if sub-answers actually address sub-questions
3. **Optimize Synthesis**: Experiment with different synthesis prompts
4. **Add Confidence Scoring**: Rate answer quality for each sub-question
5. **Build Query Router**: Automatically choose sequential vs parallel

### 📚 Further Reading

- [LangChain Query Decomposition](https://python.langchain.com/docs/use_cases/query_decomposition)
- [Least-to-Most Prompting](https://arxiv.org/abs/2205.10625)
- [Chain-of-Thought Reasoning](https://arxiv.org/abs/2201.11903)
- [Tree of Thoughts](https://arxiv.org/abs/2305.10601)