# Day 6 - Lab 1: Building a Multi-Agent RAG System (Solution)

**Objective:** Build a RAG (Retrieval-Augmented Generation) system orchestrated by LangGraph, scaling in complexity from a single agent to a multi-agent team that can reason about a knowledge base.

**Introduction:**
This solution notebook provides the complete code and explanations for building the multi-agent RAG system. It demonstrates how to use LangGraph to create increasingly complex and capable agentic workflows.

## Step 1: Setup

In [None]:
import sys
import os

# Add the project's root directory to the Python path
try:
    # This works when running as a script
    project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..'))
except NameError:
    # This works when running in an interactive environment (like a notebook)
    # We go up two levels from the notebook's directory to the project root.
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

In [None]:
import importlib
def install_if_missing(package):
    try:
        importlib.import_module(package)
    except ImportError:
        print(f"{package} not found, installing...")
        %pip install -q {package}

install_if_missing('langgraph')
install_if_missing('langchain')
install_if_missing('langchain_community')
install_if_missing('langchain_openai')
install_if_missing('faiss-cpu')
install_if_missing('pypdf')

import os
import operator
from typing import TypedDict, Annotated, List
from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.document_loaders import PyPDFLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langgraph.graph import StateGraph, END
from langchain_core.messages import BaseMessage
from utils import setup_llm_client, load_artifact

client, model_name, api_provider = setup_llm_client(model_name="gpt-4o")
llm = ChatOpenAI(model=model_name)
embeddings = OpenAIEmbeddings()

## Step 2: Building the Knowledge Base

**Explanation:**
This function gathers all our project documents, loads them, splits them into manageable chunks, and creates a FAISS vector store. The vector store converts the text chunks into numerical embeddings, which allows for efficient semantic search. The function returns a `retriever` object, which is the component our agents will use to query the knowledge base.

In [None]:
def create_knowledge_base(file_paths):
    """Loads documents from given paths and creates a FAISS vector store."""
    all_docs = []
    for path in file_paths:
        if os.path.exists(path):
            if path.endswith(".pdf"):
                loader = PyPDFLoader(path)
            else:
                loader = TextLoader(path)
            docs = loader.load()
            for doc in docs:
                doc.metadata={"source": path} # Add source metadata
            all_docs.extend(docs)
        else:
            print(f"Warning: Artifact not found at {path}")

    if not all_docs:
        print("No documents found to create knowledge base.")
        return None

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(all_docs)
    
    print(f"Creating vector store from {len(splits)} document splits...")
    vectorstore = FAISS.from_documents(documents=splits, embedding=embeddings)
    return vectorstore.as_retriever()

all_artifact_paths = ["artifacts/prd.md", "artifacts/schema.sql", "artifacts/adr_001_framework_choice.md"]
retriever = create_knowledge_base(all_artifact_paths)

## Step 3: The Challenges - Solutions

### Challenge 1 (Foundational): A Single-Agent RAG System

**Explanation:**
This is the simplest form of a LangGraph system. 
1.  **`AgentState`**: We define the 'state' of our graph using a `TypedDict`. This is the shared memory that all nodes in the graph can read from and write to.
2.  **Nodes**: Each node is a Python function that performs an action. The `retrieve_documents` node calls our retriever, and the `generate_answer` node calls the LLM.
3.  **Graph Definition**: We instantiate `StateGraph` and add our nodes. The `set_entry_point` and `add_edge` methods define the directed flow of the graph. `compile()` creates the runnable graph object.

In [None]:
class AgentState(TypedDict):
    question: str
    documents: List[Document]
    answer: str

def retrieve_documents(state):
    print("---NODE: RETRIEVE DOCUMENTS---")
    question = state["question"]
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}

def generate_answer(state):
    print("---NODE: GENERATE ANSWER---")
    question = state["question"]
    documents = state["documents"]
    prompt = f"""You are an assistant for question-answering tasks. Use the following retrieved context to answer the question. If you don't know the answer, just say that you don't know.\n\nQuestion: {question}\n\nContext: {documents}\n\nAnswer:"""
    answer = llm.invoke(prompt).content
    return {"answer": answer}

workflow = StateGraph(AgentState)
workflow.add_node("RETRIEVE", retrieve_documents)
workflow.add_node("GENERATE", generate_answer)
workflow.set_entry_point("RETRIEVE")
workflow.add_edge("RETRIEVE", "GENERATE")
workflow.add_edge("GENERATE", END)

app_v1 = workflow.compile()

print("\n--- Invoking Single-Agent Graph ---")
inputs = {"question": "What is the purpose of this project according to the PRD?"}
result = app_v1.invoke(inputs)
print(f"Final Answer: {result['answer']}")

### Challenge 2 (Intermediate): A Two-Agent System with a Grader

**Explanation:**
Here, we add a decision-making step. 
1.  **`GraderAgent` Node:** We create a new node whose sole purpose is to act as a 'grader'. It calls the LLM with a very specific prompt, asking for a 'yes' or 'no' answer on whether the retrieved documents are relevant.
2.  **Conditional Edge:** This is the key concept. `workflow.add_conditional_edges` tells the graph to execute a function (`decide_to_generate`) after the `GRADE` node. This function checks the output of the grader and returns the name of the *next* node to execute. This allows for dynamic routing and makes the agent much smarter.

In [None]:
class GraderState(AgentState):
    grade: str

def grade_documents(state):
    print("---NODE: GRADE DOCUMENTS---")
    question = state["question"]
    documents = state["documents"]
    prompt = f"""You are a grader assessing relevance of a retrieved document to a user question. If the document contains keywords related to the user question, grade it as relevant. It does not need to be a stringent test. The goal is to filter out erroneous retrievals. Grade 'yes' or 'no'.\n\nRetrieved Document: {documents}\n\nUser Question: {question}"""
    grade = llm.invoke(prompt).content
    return {"grade": grade}

def decide_to_generate(state):
    print("---NODE: CONDITIONAL EDGE---")
    if state["grade"].lower() == "yes":
        print("DECISION: Documents are relevant. Proceed to generation.")
        return "GENERATE"
    else:
        print("DECISION: Documents are not relevant. End process.")
        return END

workflow_v2 = StateGraph(GraderState)
workflow_v2.add_node("RETRIEVE", retrieve_documents)
workflow_v2.add_node("GRADE", grade_documents)
workflow_v2.add_node("GENERATE", generate_answer)

workflow_v2.set_entry_point("RETRIEVE")
workflow_v2.add_edge("RETRIEVE", "GRADE")
workflow_v2.add_conditional_edges(
    "GRADE",
    decide_to_generate,
)
workflow_v2.add_edge("GENERATE", END)

app_v2 = workflow_v2.compile()

print("\n--- Invoking Two-Agent Graph with a relevant question ---")
inputs = {"question": "What database schema will we use?"}
result = app_v2.invoke(inputs)
print(f"Final Answer: {result.get('answer', 'Could not answer question.')}")

print("\n--- Invoking Two-Agent Graph with an irrelevant question ---")
inputs = {"question": "What is the weather in Paris?"}
result = app_v2.invoke(inputs)
print(f"Final Answer: {result.get('answer', 'Could not answer question.')}")

### Challenge 3 (Advanced): A 5-Agent Research Team with Human-in-the-Loop

**Explanation:**
This is a highly advanced workflow that mimics a real research team.
1.  **Specialized Retrievers:** We create two separate vector stores and retrievers. This specialization allows us to direct queries to the most relevant knowledge source.
2.  **Router/PM Agent:** The `project_manager_agent` node acts as a router. It uses the LLM's reasoning ability to decide which specialist (PRD or Tech researcher) should handle the query.
3.  **Human-in-the-Loop:** The `human_validation` node is a critical pattern for responsible AI. It explicitly pauses the automated process and requires human confirmation before proceeding. This is essential for tasks where the AI's output could have significant consequences.

In [None]:
# 1. Create specialized retrievers
prd_retriever = create_knowledge_base(["artifacts/prd.md"])
tech_retriever = create_knowledge_base(["artifacts/schema.sql", "artifacts/adr_001_framework_choice.md"])

class ResearchState(TypedDict):
    question: str
    documents: List[Document]
    draft_answer: str
    final_answer: str

# 2. Define the agent nodes
def prd_researcher_agent(state):
    print("---NODE: PRD RESEARCHER---")
    documents = prd_retriever.invoke(state["question"])
    return {"documents": documents}

def tech_researcher_agent(state):
    print("---NODE: TECH RESEARCHER---")
    documents = tech_retriever.invoke(state["question"])
    return {"documents": documents}

def synthesizer_agent(state):
    print("---NODE: SYNTHESIZER---")
    prompt = f"Based on the following documents, create a concise draft answer to the user's question.\n\nQuestion: {state['question']}\n\nDocuments: {state['documents']}"
    draft = llm.invoke(prompt).content
    return {"draft_answer": draft}

def human_validation_node(state):
    print("---NODE: HUMAN VALIDATION---")
    print(f"Draft Answer: {state['draft_answer']}")
    print("Sources:", [doc.metadata['source'] for doc in state['documents']])
    response = input("Is this answer helpful and correct? (yes/no): ")
    if response.lower() == 'yes':
        return {"final_answer": state['draft_answer']}
    else:
        return {"final_answer": "The user rejected the draft answer."}

def project_manager_router(state):
    print("---NODE: PROJECT MANAGER (ROUTER)---")
    prompt = f"You are a project manager. Based on the user's question, should you route this to the PRD expert or the Technical expert? Answer with 'prd' or 'tech'.\n\nQuestion: {state['question']}"
    decision = llm.invoke(prompt).content
    print(f"PM Decision: Route to {decision}")
    if 'prd' in decision.lower():
        return "PRD_RESEARCHER"
    else:
        return "TECH_RESEARCHER"

# 3. Build the graph
workflow_v3 = StateGraph(ResearchState)
workflow_v3.add_node("PRD_RESEARCHER", prd_researcher_agent)
workflow_v3.add_node("TECH_RESEARCHER", tech_researcher_agent)
workflow_v3.add_node("SYNTHESIZE", synthesizer_agent)
workflow_v3.add_node("VALIDATE", human_validation_node)

workflow_v3.add_conditional_edges(
    "__start__",
    project_manager_router,
)
workflow_v3.add_edge("PRD_RESEARCHER", "SYNTHESIZE")
workflow_v3.add_edge("TECH_RESEARCHER", "SYNTHESIZE")
workflow_v3.add_edge("SYNTHESIZE", "VALIDATE")
workflow_v3.add_edge("VALIDATE", END)

app_v3 = workflow_v3.compile()

print("\n--- Invoking 5-Agent Research Team ---")
inputs = {"question": "What framework was chosen and why?"}
# This will now require user input in the console to complete.
# result = app_v3.invoke(inputs)
# print(f"Final Answer: {result['final_answer']}")

## Lab Conclusion

Incredible work! You have now built a truly sophisticated AI system. You've learned how to create a knowledge base for an agent and how to use LangGraph to orchestrate a team of specialized agents to solve a complex problem. Most importantly, you implemented a human-in-the-loop validation step, which is a critical pattern for building safe, reliable, and trustworthy AI applications. In the next lab, we will integrate this powerful system into our FastAPI backend.