# Notebook 14 (Industrial Edition): Parallel Multi-Hop Retrieval for Complex Questions

## Introduction: From Question Answering to AI Research Assistant

This notebook explores our final and most sophisticated RAG pattern: **Parallel Multi-Hop Retrieval**. This architecture elevates a RAG system from a simple fact-lookup tool into a genuine research agent capable of answering complex, comparative, and multi-step questions. It tackles queries that cannot be answered by any single document but require synthesizing information across multiple sources.

### The Core Concept: Decompose, Retrieve in Parallel, and Synthesize

The workflow mirrors how a human researcher would tackle a complex question:
1.  **Decompose:** A high-level "Meta-Agent" analyzes the complex user query and breaks it down into several simpler, independent sub-questions.
2.  **Scatter (Parallel Retrieval):** Each sub-question is dispatched to its own dedicated "Retrieval Agent". These agents run in parallel, each performing a standard RAG process to find the answer to its specific sub-question.
3.  **Gather & Synthesize:** The Meta-Agent collects the answers to all the sub-questions and then performs a final reasoning step to synthesize them into a single, comprehensive answer to the original complex query.

### Role in a Large-Scale System: Evolving RAG from Simple Q&A to Complex Research & Synthesis

This pattern is the key to unlocking deep reasoning capabilities in any knowledge-intensive application:
- **Financial Analysis:** Answering "Compare the Q1 revenue growth of our top three competitors."
- **Scientific Research:** Summarizing "What is the relationship between protein A and disease B, and what are the known therapeutic interventions?"
- **Legal Strategy:** Analyzing "What precedents exist for patent infringement cases involving software, and how do they differ from hardware cases?"

We will build and compare a Simple RAG system with a Multi-Hop RAG system. We will demonstrate that only the Multi-Hop system can successfully gather the necessary evidence to provide an **accurate and insightful** answer to a complex comparative question.

## Part 1: Setup and Environment

In [None]:
%pip install -U langchain langgraph langsmith langchain-huggingface transformers accelerate bitsandbytes torch langchain-community sentence-transformers faiss-cpu

### 1.2: API Keys and Environment Configuration

In [None]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("LANGCHAIN_API_KEY")
_set_env("HUGGING_FACE_HUB_TOKEN")

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Industrial - RAG Multi-Hop Retrieval"

## Part 2: Components for the Multi-Hop System

### 2.1: The Language Model (LLM)

We will use `meta-llama/Meta-Llama-3-8B-Instruct` for all our agents.

In [None]:
from langchain_huggingface import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto", load_in_4bit=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=2048, do_sample=False)
llm = HuggingFacePipeline(pipeline=pipe)

print("LLM Initialized. Ready to power our research agent.")

LLM Initialized. Ready to power our research agent.


### 2.2: Creating the Knowledge Base

We'll create a knowledge base with distinct, non-overlapping information about two different products. This will make it impossible for a single retrieval step to answer a comparative question.

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.documents import Document

kb_docs = [
    Document(page_content="The QLeap-V4 processor is designed for maximum performance in data centers. It consumes 1200W of power under full load and uses a specialized liquid cooling system to manage heat.", metadata={"product": "QLeap-V4"}),
    Document(page_content="Key features of the QLeap-V4 include 128 tensor cores and a 3nm process node, making it ideal for large-scale AI model training.", metadata={"product": "QLeap-V4"}),
    Document(page_content="The Eco-AI-M2 chip is designed for edge computing and mobile devices. Its primary feature is low power consumption, drawing only 15W under full load.", metadata={"product": "Eco-AI-M2"}),
    Document(page_content="Built on a 7nm process node, the Eco-AI-M2 has 8 specialized neural cores, making it perfect for real-time inference on devices like drones and smart cameras.", metadata={"product": "Eco-AI-M2"})
]

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(kb_docs, embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

print(f"Knowledge Base created with {len(kb_docs)} documents.")

Knowledge Base created with 4 documents.


### 2.3: Structured Data Models

We need Pydantic models to structure the decomposition step and to manage the flow of information.

In [None]:
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List

class SubQuestions(BaseModel):
    """A list of independent sub-questions to be answered in parallel."""
    questions: List[str] = Field(description="A list of 2-3 simple, self-contained questions that, when answered together, will fully address the original complex query.")

## Part 3: The Baseline - A Simple RAG System

Let's first see how a standard RAG agent fails when faced with a complex, comparative question.

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

generator_prompt_template = (
    "You are an expert AI hardware analyst. Answer the user's question with high accuracy, based *only* on the following context. "
    "Synthesize the information into a clear, comparative answer.\n\n"
    "Context:\n{context}\n\nQuestion: {question}"
)
generator_prompt = ChatPromptTemplate.from_template(generator_prompt_template)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

simple_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | generator_prompt
    | llm
| StrOutputParser()
)

## Part 4: Building the Multi-Hop RAG Graph

Now we build the advanced system. It will have nodes for decomposition, parallel retrieval, and final synthesis.

### 4.1: Graph State and Nodes

In [None]:
from typing import TypedDict, List, Dict, Annotated
from langchain_core.documents import Document
import operator
from concurrent.futures import ThreadPoolExecutor

class MultiHopRAGState(TypedDict):
    original_question: str
    sub_questions: List[str]
    # The dict will store the answer to each sub-question
    sub_question_answers: Annotated[Dict[str, str], operator.update]
    final_answer: str

# Node 1: Decomposer (The Meta-Agent)
decomposer_prompt = ChatPromptTemplate.from_template(
    "You are a query decomposition expert. Your job is to break down a complex question into simple, independent sub-questions that can be answered by a retrieval system. "
    "Do not try to answer the questions yourself.\n\n"
    "Question: {question}"
)
decomposer_chain = decomposer_prompt | llm.with_structured_output(SubQuestions)

def decomposer_node(state: MultiHopRAGState):
    print("--- [Meta-Agent] Decomposing complex question... ---")
    result = decomposer_chain.invoke({"question": state['original_question']})
    print(f"--- [Meta-Agent] Generated {len(result.questions)} sub-questions. ---")
    return {"sub_questions": result.questions}

# Node 2: Parallel Retrieval Agents
# This is a self-contained RAG chain that answers a single, simple question.
sub_question_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | generator_prompt
    | llm
    | StrOutputParser()
)

def retrieval_agent_node(state: MultiHopRAGState):
    """Runs a RAG process for each sub-question in parallel."""
    print(f"--- [Retrieval Agents] Answering {len(state['sub_questions'])} sub-questions in parallel... ---")
    
    answers = {}
    with ThreadPoolExecutor(max_workers=len(state['sub_questions'])) as executor:
        # Map each sub-question to the RAG chain
        future_to_question = {executor.submit(sub_question_rag_chain.invoke, q): q for q in state['sub_questions']}
        for future in as_completed(future_to_question):
            question = future_to_question[future]
            try:
                answer = future.result()
                answers[question] = answer
                print(f"  - Answer found for sub-question: '{question}'")
            except Exception as e:
                answers[question] = f"Error answering question: {e}"

    return {"sub_question_answers": answers}

# Node 3: Synthesizer (The Meta-Agent's final step)
synthesizer_prompt = ChatPromptTemplate.from_template(
    "You are a synthesis expert. Your job is to combine the answers to several sub-questions into a single, cohesive, and comprehensive answer to the user's original complex question.\n\n"
    "Original Question: {original_question}\n\n"
    "Sub-Question Answers:\n{sub_question_answers}"
)
synthesizer_chain = synthesizer_prompt | llm | StrOutputParser()

def synthesizer_node(state: MultiHopRAGState):
    print("--- [Meta-Agent] Synthesizing final answer... ---")
    
    sub_answers_str = "\n".join([f"- Q: {q}\n- A: {a}" for q, a in state['sub_question_answers'].items()])
    
    final_answer = synthesizer_chain.invoke({
        "original_question": state['original_question'],
        "sub_question_answers": sub_answers_str
    })
    return {"final_answer": final_answer}

#### 4.2 Assembling the Graph

In [None]:
from langgraph.graph import StateGraph, END

workflow = StateGraph(MultiHopRAGState)

workflow.add_node("decompose", decomposer_node)
workflow.add_node("retrieve_in_parallel", retrieval_agent_node)
workflow.add_node("synthesize", synthesizer_node)

workflow.set_entry_point("decompose")
workflow.add_edge("decompose", "retrieve_in_parallel")
workflow.add_edge("retrieve_in_parallel", "synthesize")
workflow.add_edge("synthesize", END)

multi_hop_rag_app = workflow.compile()

Multi-Hop RAG graph compiled successfully.


## Part 5: Head-to-Head Comparison

Let's ask our complex, comparative question. A single retrieval step will likely only find documents about one of the products, failing to gather the necessary context for a comparison.

In [None]:
user_query = "Compare the QLeap-V4 and the Eco-AI-M2, focusing on their target use case and power consumption."

### 5.1: Running the Simple RAG System

In [None]:
print("="*60)
print("                  SIMPLE RAG SYSTEM OUTPUT")
print("="*60 + "\n")

# We intercept the retrieval step to inspect the documents
simple_retrieved_docs = retriever.invoke(user_query)

print("Retrieved Context:")
print(format_docs(simple_retrieved_docs) + "\n")

simple_answer = simple_rag_chain.invoke(user_query)
print("Final Answer:")
print(simple_answer)

                  SIMPLE RAG SYSTEM OUTPUT

Retrieved Context:
The Eco-AI-M2 chip is designed for edge computing and mobile devices. Its primary feature is low power consumption, drawing only 15W under full load.
Built on a 7nm process node, the Eco-AI-M2 has 8 specialized neural cores, making it perfect for real-time inference on devices like drones and smart cameras.

Final Answer:
Based on the provided context, the Eco-AI-M2 chip is designed for edge computing and mobile devices, with a primary feature of low power consumption at only 15W under full load. The context does not contain information about the QLeap-V4, so I cannot provide a comparison.


### 5.2: Running the Multi-Hop RAG System

In [None]:
inputs = {"original_question": user_query}
multi_hop_result = None
for output in multi_hop_rag_app.stream(inputs, stream_mode="values"):
    multi_hop_result = output

print("="*60)
print("                 MULTI-HOP RAG SYSTEM OUTPUT")
print("="*60 + "\n")

print("--- Sub-Question Answers ---")
for i, (q, a) in enumerate(multi_hop_result['sub_question_answers'].items()):
    print(f"{i+1}. Q: {q}")
    print(f"   A: {a}")

print("\n--- Final Synthesized Answer ---")
print(multi_hop_result['final_answer'])

--- [Meta-Agent] Decomposing complex question... ---
--- [Meta-Agent] Generated 2 sub-questions. ---
--- [Retrieval Agents] Answering 2 sub-questions in parallel... ---
  - Answer found for sub-question: 'What is the target use case and power consumption of the QLeap-V4?'
  - Answer found for sub-question: 'What is the target use case and power consumption of the Eco-AI-M2?'
--- [Meta-Agent] Synthesizing final answer... ---

                 MULTI-HOP RAG SYSTEM OUTPUT

--- Sub-Question Answers ---
1. Q: What is the target use case and power consumption of the QLeap-V4?
   A: The QLeap-V4 processor is designed for maximum performance in data centers, with a primary use case of large-scale AI model training. It consumes 1200W of power under full load.
2. Q: What is the target use case and power consumption of the Eco-AI-M2?
   A: The Eco-AI-M2 chip is designed for edge computing and mobile devices like drones and smart cameras. Its key feature is low power consumption, drawing only 15W 