# Day 6 - Lab 1: Building a Multi-Agent RAG System

**Objective:** Build a RAG (Retrieval-Augmented Generation) system orchestrated by LangGraph, scaling in complexity from a single agent to a multi-agent team that can reason about a knowledge base.

**Estimated Time:** 135 minutes

**Introduction:**
Welcome to Day 6! Today, we build one of the most powerful and common patterns for enterprise AI: a system that can answer questions about your private documents. We will use LangGraph to create a 'research team' of AI agents. Each agent will have a specific job, and LangGraph will act as the manager, orchestrating their collaboration to find the best possible answer.

## Step 1: Setup

We need several libraries for this lab. `langgraph` is the core orchestrator, `langchain` provides the building blocks, `faiss-cpu` is for our vector store, and `pypdf` is for loading documents.

In [None]:
import sys
import os

# Add the project's root directory to the Python path
try:
    # This works when running as a script
    project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..'))
except NameError:
    # This works when running in an interactive environment (like a notebook)
    # We go up two levels from the notebook's directory to the project root.
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

In [None]:
# This helper will install packages if they are not found
import importlib
def install_if_missing(package):
    try:
        importlib.import_module(package)
    except ImportError:
        print(f"{package} not found, installing...")
        %pip install -q {package}

install_if_missing('langgraph')
install_if_missing('langchain')
install_if_missing('langchain_community')
install_if_missing('langchain_openai')
install_if_missing('faiss-cpu')
install_if_missing('pypdf')

import os
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from utils import setup_llm_client, load_artifact

client, model_name, api_provider = setup_llm_client()
embeddings = OpenAIEmbeddings()

## Step 2: Building the Knowledge Base

An agent is only as smart as the information it can access. We will create a vector store containing all the project artifacts we've created so far. This will be our agent's 'knowledge base'.

In [None]:
def create_knowledge_base():
    """Loads all artifacts and creates a FAISS vector store."""
    artifact_paths = [
        "artifacts/prd.md",
        "artifacts/schema.sql",
        "artifacts/adr_001_framework_choice.md"
    ]
    all_docs = []
    for path in artifact_paths:
        if os.path.exists(path):
            # For simplicity, we'll treat each file as a single document.
            # A more advanced loader could handle different file types.
            content = load_artifact(path)
            from langchain_core.documents import Document
            all_docs.append(Document(page_content=content, metadata={"source": path}))
        else:
            print(f"Warning: Artifact not found at {path}")

    if not all_docs:
        print("No documents found to create knowledge base.")
        return None

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(all_docs)
    
    print(f"Creating vector store from {len(splits)} document splits...")
    vectorstore = FAISS.from_documents(documents=splits, embedding=embeddings)
    return vectorstore.as_retriever()

retriever = create_knowledge_base()

## Step 3: The Challenges

### Challenge 1 (Foundational): A Single-Agent RAG System

**Task:** Build a simple LangGraph with two nodes: one to retrieve documents and one to generate an answer.

**Instructions:**
1.  Define the state for your graph. It should contain keys for `question` and `documents`.
2.  Create a "Retriever Agent" node. This is a Python function that takes the state, uses the `retriever` to get relevant documents, and updates the state with the results.
3.  Create a "Generate Answer" node. This function takes the state, creates a prompt with the question and retrieved documents, calls the LLM, and stores the answer.
4.  Build the `StatefulGraph`, add the nodes, and define the edges (`RETRIEVE` -> `GENERATE`).
5.  Compile the graph and invoke it with a question about your project.

**Expected Quality:** A functional graph that can answer a simple question (e.g., "What is the purpose of this project?") by retrieving context from the project artifacts.

In [None]:
# TODO: Write the code for the single-agent RAG system using LangGraph.
# This will involve defining the state, the nodes, and the graph itself.


### Challenge 2 (Intermediate): A Two-Agent System with a Grader

**Task:** Add a second agent to your graph that acts as a "Grader," deciding if the retrieved documents are relevant enough to answer the question.

**Instructions:**
1.  Keep your `RETRIEVE` node from the previous challenge.
2.  Create a new "Grader Agent" node. This function takes the state (question and documents) and calls an LLM with a specific prompt: "Based on the question and the following documents, is the information sufficient to answer the question? Answer with only 'yes' or 'no'."
3.  Add a **conditional edge** to your graph. After the `RETRIEVE` node, the graph should go to the `GRADE` node. After the `GRADE` node, it should check the grader's response. If 'yes', it proceeds to the `GENERATE` node. If 'no', it goes to an `END` node, concluding that it cannot answer the question.

**Expected Quality:** A more robust graph that can gracefully handle cases where its knowledge base doesn't contain the answer, preventing it from hallucinating.

In [None]:
# TODO: Write the code for the two-agent system with a Grader and conditional edges.


### Challenge 3 (Advanced): A 5-Agent Research Team with Human-in-the-Loop

**Task:** Build a sophisticated "research team" of specialized agents and add a human validation step before the final answer is given.

**Instructions:**
1.  **Specialize your retriever:** Create two separate retrievers. One for the PRD (`prd_retriever`) and one for the technical documents (`tech_retriever` for schema and ADRs).
2.  **Define the Agents:**
    * `ProjectManagerAgent`: The entry point. It uses an LLM to decide whether the user's question is about product requirements or technical details, and routes to the appropriate researcher.
    * `PRDResearcherAgent`: A node that uses the `prd_retriever`.
    * `TechResearcherAgent`: A node that uses the `tech_retriever`.
    * `SynthesizerAgent`: A node that takes the collected documents from the researchers and synthesizes a draft answer.
3.  **Add a Human-in-the-Loop Node:** After the `SYNTHESIZE` node, create a special node that prints the draft answer and the source documents and then waits for user input (e.g., `input('Is this answer helpful? (yes/no): ')`).
4.  **Build the Graph:** Use conditional edges to orchestrate the flow: `PM` -> (`PRD_RESEARCHER` or `TECH_RESEARCHER`) -> `SYNTHESIZE` -> `HUMAN_VALIDATION` -> `END`.

**Expected Quality:** A highly advanced agentic system that mimics a real-world research workflow, including specialist roles, information synthesis, and final human approval.

In [None]:
# TODO: Write the code for the five-agent research team with specialized retrievers and a human validation step.


## Lab Conclusion

Incredible work! You have now built a truly sophisticated AI system. You've learned how to create a knowledge base for an agent and how to use LangGraph to orchestrate a team of specialized agents to solve a complex problem. Most importantly, you implemented a human-in-the-loop validation step, which is a critical pattern for building safe, reliable, and trustworthy AI applications. In the next lab, we will integrate this powerful system into our FastAPI backend.