# Day 6 - Lab 1: Building RAG Systems (Solution)

**Objective:** Build a RAG (Retrieval-Augmented Generation) system orchestrated by LangGraph, scaling in complexity from a simple retriever to a multi-agent team that includes a grader and a router.

**Introduction:**
This solution notebook provides the complete code and explanations for building the multi-agent RAG system. It demonstrates how to use LangGraph to create increasingly complex and capable agentic workflows.

For definitions of key terms used in this lab, please refer to the [GLOSSARY.md](../../GLOSSARY.md).

## Step 1: Setup

In [None]:
import sys
import os

# Add the project's root directory to the Python path
try:
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
except IndexError:
    project_root = os.path.abspath(os.path.join(os.getcwd()))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

import importlib
def install_if_missing(package):
    try:
        importlib.import_module(package)
    except ImportError:
        print(f"{package} not found, installing...")
        import subprocess
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])

install_if_missing('langgraph')
install_if_missing('langchain')
install_if_missing('langchain_community')
install_if_missing('langchain_openai')
install_if_missing('faiss-cpu')
install_if_missing('pypdf')

from utils import setup_llm_client, load_artifact
from typing import List, TypedDict
from langchain_core.documents import Document
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langgraph.graph import StateGraph, END

client, model_name, api_provider = setup_llm_client(model_name="gpt-4o")
llm = ChatOpenAI(model=model_name)
embeddings = OpenAIEmbeddings()

## Step 2: Building the Knowledge Base

**Explanation:**
This function gathers all our project documents, loads them, splits them into manageable chunks, and creates a FAISS vector store. The vector store converts the text chunks into numerical embeddings, which allows for efficient semantic search. The function returns a `retriever` object, which is the component our agents will use to query the knowledge base.

In [None]:
def create_knowledge_base(file_paths):
    """Loads documents from given paths and creates a FAISS vector store.""" 
    all_docs = []
    for path in file_paths:
        full_path = os.path.join(project_root, path)
        if os.path.exists(full_path):
            loader = TextLoader(full_path)
            docs = loader.load()
            for doc in docs:
                doc.metadata={"source": path} # Add source metadata
            all_docs.extend(docs)
        else:
            print(f"Warning: Artifact not found at {full_path}")

    if not all_docs:
        print("No documents found to create knowledge base.")
        return None

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    splits = text_splitter.split_documents(all_docs)
    
    print(f"Creating vector store from {len(splits)} document splits...")
    vectorstore = FAISS.from_documents(documents=splits, embedding=embeddings)
    return vectorstore.as_retriever()

all_artifact_paths = ["artifacts/day1_prd.md", "artifacts/schema.sql", "artifacts/adr_001_database_choice.md"]
retriever = create_knowledge_base(all_artifact_paths)

## Step 3: The Challenges - Solutions

### Challenge 1 (Foundational): A Simple RAG Graph

**Explanation:**
This is the simplest form of a LangGraph system. 
1.  **`AgentState`**: We define the 'state' of our graph using a `TypedDict`. This is the shared memory that all nodes in the graph can read from and write to.
2.  **Nodes**: Each node is a Python function that performs an action. The `retrieve` node calls our retriever, and the `generate` node calls the LLM.
3.  **Graph Definition**: We instantiate `StateGraph` and add our nodes. The `set_entry_point` and `add_edge` methods define the directed flow of the graph. `compile()` creates the runnable graph object.

In [None]:
class SimpleAgentState(TypedDict):
    question: str
    documents: List[Document]
    answer: str

def retrieve(state):
    print("---NODE: RETRIEVE DOCUMENTS---")
    question = state["question"]
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}

def generate(state):
    print("---NODE: GENERATE ANSWER---")
    question = state["question"]
    documents = state["documents"]
    prompt = f"""You are an assistant for question-answering tasks. Use the following retrieved context to answer the question. If you don't know the answer, just say that you don't know.\n\nQuestion: {question}\n\nContext: {documents}\n\nAnswer:"""
    answer = llm.invoke(prompt).content
    return {"answer": answer}

workflow_v1 = StateGraph(SimpleAgentState)
workflow_v1.add_node("RETRIEVE", retrieve)
workflow_v1.add_node("GENERATE", generate)
workflow_v1.set_entry_point("RETRIEVE")
workflow_v1.add_edge("RETRIEVE", "GENERATE")
workflow_v1.add_edge("GENERATE", END)

app_v1 = workflow_v1.compile()

print("\n--- Invoking Simple RAG Graph ---")
inputs = {"question": "What is the purpose of this project according to the PRD?"}
result = app_v1.invoke(inputs)
print(f"Final Answer: {result['answer']}")

### Challenge 2 (Intermediate): A Graph with a Grader Agent

**Explanation:**
Adding a Grader agent prevents the system from trying to answer a question with irrelevant information. This directly combats hallucination and makes the RAG system more trustworthy by allowing it to gracefully say, 'I don't know,' instead of making something up.

1.  **`GraderAgent` Node:** We create a new node whose sole purpose is to act as a 'grader'. It calls the LLM with a very specific prompt, asking for a 'yes' or 'no' answer on whether the retrieved documents are relevant.
2.  **Conditional Edge:** This is the key concept. `workflow.add_conditional_edges` tells the graph to execute a function (`decide_to_generate`) after the `GRADE` node. This function checks the output of the grader and returns the name of the *next* node to execute. This allows for dynamic routing and makes the agent much smarter.

In [None]:
class GraderAgentState(SimpleAgentState):
    grade: str

def grade_documents(state):
    print("---NODE: GRADE DOCUMENTS---")
    question = state["question"]
    documents = state["documents"]
    prompt = f"""You are a grader assessing relevance of a retrieved document to a user question. If the document contains keywords related to the user question, grade it as relevant. It does not need to be a stringent test. The goal is to filter out erroneous retrievals. Grade 'yes' or 'no'.\n\nRetrieved Document: {documents}\n\nUser Question: {question}"""
    grade = llm.invoke(prompt).content
    return {"grade": grade}

def decide_to_generate(state):
    print("---NODE: CONDITIONAL EDGE---")
    if state["grade"].lower() == "yes":
        print("DECISION: Documents are relevant. Proceed to generation.")
        return "GENERATE"
    else:
        print("DECISION: Documents are not relevant. End process.")
        return END

workflow_v2 = StateGraph(GraderAgentState)
workflow_v2.add_node("RETRIEVE", retrieve)
workflow_v2.add_node("GRADE", grade_documents)
workflow_v2.add_node("GENERATE", generate)

workflow_v2.set_entry_point("RETRIEVE")
workflow_v2.add_edge("RETRIEVE", "GRADE")
workflow_v2.add_conditional_edges("GRADE", decide_to_generate)
workflow_v2.add_edge("GENERATE", END)

app_v2 = workflow_v2.compile()

print("\n--- Invoking Grader Graph with a relevant question ---")
inputs = {"question": "What database schema will we use?"}
result = app_v2.invoke(inputs)
print(f"Final Answer: {result.get('answer', 'Could not answer question.')}")

print("\n--- Invoking Grader Graph with an irrelevant question ---")
inputs = {"question": "What is the weather in Paris?"}
result = app_v2.invoke(inputs)
print(f"Final Answer: {result.get('answer', 'Could not answer question.')}")

### Challenge 3 (Advanced): A Multi-Agent Research Team

**Explanation:**
This is a highly advanced workflow that mimics a real research team.
1.  **Specialized Retrievers:** We create two separate vector stores and retrievers. This specialization allows us to direct queries to the most relevant knowledge source.
2.  **Router/PM Agent:** The `ProjectManagerAgent` acts as a 'router.' This is a highly efficient pattern. Instead of one giant agent searching through all documents, the router first makes a quick, low-cost decision to delegate the task to a specialized agent with a smaller, more relevant knowledge base. This improves both speed and accuracy.
3.  **Graph Construction:** We build the most complex graph yet. The entry point is the router. Based on its decision, the graph flows to one of the two specialist researchers. Both of their paths then converge on the `SYNTHESIZE` node, which creates the final answer.

In [None]:
# 1. Create specialized retrievers
prd_retriever = create_knowledge_base(["artifacts/day1_prd.md"])
tech_retriever = create_knowledge_base(["artifacts/schema.sql", "artifacts/adr_001_database_choice.md"])

class ResearchTeamState(TypedDict):
    question: str
    documents: List[Document]
    answer: str

# 2. Define the agent nodes
def prd_researcher(state):
    print("---NODE: PRD RESEARCHER---")
    documents = prd_retriever.invoke(state["question"])
    return {"documents": documents}

def tech_researcher(state):
    print("---NODE: TECH RESEARCHER---")
    documents = tech_retriever.invoke(state["question"])
    return {"documents": documents}

def synthesize_answer(state):
    print("---NODE: SYNTHESIZE ANSWER---")
    prompt = f"Based on the following documents, create a concise answer to the user's question.\n\nQuestion: {state['question']}\n\nDocuments: {state['documents']}"
    answer = llm.invoke(prompt).content
    return {"answer": answer}

def project_manager_router(state):
    print("---NODE: PROJECT MANAGER (ROUTER)---")
    prompt = f"You are a project manager. Based on the user's question, should you route this to the PRD expert or the Technical expert? Answer with 'PRD_RESEARCHER' or 'TECH_RESEARCHER'.\n\nQuestion: {state['question']}"
    decision = llm.invoke(prompt).content
    print(f"PM Decision: Route to {decision}")
    if 'PRD_RESEARCHER' in decision:
        return "PRD_RESEARCHER"
    else:
        return "TECH_RESEARCHER"

# 3. Build the graph
workflow_v3 = StateGraph(ResearchTeamState)
workflow_v3.add_node("PRD_RESEARCHER", prd_researcher)
workflow_v3.add_node("TECH_RESEARCHER", tech_researcher)
workflow_v3.add_node("SYNTHESIZE", synthesize_answer)

workflow_v3.add_conditional_edges("__start__", project_manager_router)
workflow_v3.add_edge("PRD_RESEARCHER", "SYNTHESIZE")
workflow_v3.add_edge("TECH_RESEARCHER", "SYNTHESIZE")
workflow_v3.add_edge("SYNTHESIZE", END)

app_v3 = workflow_v3.compile()

print("\n--- Invoking Research Team with a PRD question ---")
inputs = {"question": "What are the main user personas for this application?"}
result = app_v3.invoke(inputs)
print(f"Final Answer: {result['answer']}")

print("\n--- Invoking Research Team with a technical question ---")
inputs = {"question": "What columns are in the users table?"}
result = app_v3.invoke(inputs)
print(f"Final Answer: {result['answer']}")

## Lab Conclusion

Incredible work! You have now built a truly sophisticated AI system. You've learned how to create a knowledge base for an agent and how to use LangGraph to orchestrate a team of specialized agents to solve a complex problem. You progressed from a simple RAG chain to a system that includes quality checks (the Grader) and intelligent task delegation (the Router). These are the core patterns for building production-ready RAG applications.

> **Key Takeaway:** LangGraph allows you to define complex, stateful, multi-agent workflows as a graph. Using nodes for agents and conditional edges for decision-making enables the creation of sophisticated systems that can reason, delegate, and collaborate to solve problems more effectively than a single agent could alone.