## Assignment 2: 
Aim: to create a simple RAG in the LangGraph ecosystem that takes in the academic template at IITK and warns me about the impending courses that I have to deal with.

Loading and Embedding Documents:

In [13]:

from langchain_community.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyMuPDFLoader("UG-Template[1].pdf")
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)

Embedding and storing it in the vector store (FAISS)

In [20]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

vectorstore = FAISS.from_documents(chunks, embedding_model)

# retriever from the vector store
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 4})

Setting up Groq and connecting it to LangChain

In [21]:
from langchain_groq import ChatGroq
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import os
from dotenv import load_dotenv
load_dotenv()


llm = ChatGroq(
    api_key=os.getenv("GROQ_API_KEY"),
    model="llama3-8b-8192",
    temperature = 0.2
)

prompt_template = PromptTemplate.from_template("""
You are a helpful assistant that browses through the retrieved documents, focuses on the whole document and mostly on the tables given that represent the templates for programmes at IITK. You are able to link the full names of the programmes and departments to their respective abbreviations. You answer questions that the user has according to the following context:

Context:
{context}

Question:
{question}

Answer:""")

# building the rag chain
rag_chain = LLMChain(llm=llm, prompt=prompt_template)

Setting up the LangGraph part

In [22]:
from typing import TypedDict, List

# declaring the AgentState
class AgentState(TypedDict):
    question: str
    docs: List[str]       # retrieved context window
    answer: str           # answer generated
    

# defining the node that does the retrieval
def retrieve_node(state: AgentState) -> AgentState:
    docs = retriever.get_relevant_documents(state["question"])
    doc_texts = [doc.page_content for doc in docs]
    return {"docs": doc_texts}

# defining the node that generates answers
def generate_answer_node(state: AgentState) -> AgentState:
    context = "\n\n".join(state.get("docs", []))
    question = state.get("question", "")

    result = rag_chain.invoke({"context": context, "question": question})

    print("generate_answer_node invoked.")
    print("Prompt Sent:\n", result)

    return {"answer": result["text"]}

Building the LangGraph workflow

In [23]:
from langgraph.graph import StateGraph, END

workflow = StateGraph(AgentState)

# adding nodes
workflow.add_node("retrieve", retrieve_node)
workflow.add_node("generate", generate_answer_node)

# defining the edges
workflow.set_entry_point("retrieve")                   # START node not used, just added this directly instead
workflow.add_edge("retrieve", "generate")              # retrieve -> generate
workflow.add_edge("generate", END)               # generate -> END

app = workflow.compile()

## Trial 1:
with a sort of vague (atleast to the agent) question using abbreviation

In [24]:
# ask the initial query
user_question = "What courses am I supposed to take in the 3rd semester of BT-BSBE"

# starting AgentState
initial_state: AgentState = {
    "question": user_question,
    "docs": [],
    "answer": "",
    "reflection": "",
    "retry": False
}

state = initial_state.copy()

for step_output in app.stream(initial_state):
    if "retrieve" in step_output:
        print("\n==============================")
        print("📚 RETRIEVED DOCUMENTS:\n")
        for i, doc in enumerate(step_output["retrieve"]["docs"], 1):
            print(f"Document {i}:\n{doc}\n")

    elif "generate" in step_output:
        answer = step_output["generate"]["answer"]
        print("\n==============================")
        print("🤖 GENERATED ANSWER:\n")
        print(answer.strip(), "\n")
   
    state.update(step_output)


📚 RETRIEVED DOCUMENTS:

Document 1:
table in odd semester for converting to BS-MS.  
— BTH/BSH and BTM/BSM students should be considered eligible for PG programs at par with the 
students from existing 4-year programs.  
If all the necessary courses as per BS template (semester 3rd to semester 8th) are satisfied, then the 
department permits the BSH and BSM students to convert to BS-MS.  
 
Bachelors-Masters (five-year Dual-degree) program (Category B).  
Mandatory UG Components 
PG Components 
 
Semester 9 
Semester 10

Document 2:
(M. Tech. Seminar) 
 
CE780A [9] 
 
 
At least additional 26 PG-OE credits in consultation and with consent 
of thesis supervisor 
Other Courses from BT template of parent Department 
Max. 65
Max. 65
Max. 65
Max. 65
AP-211 
556th SENATE MEETING

Document 3:
Option: 1 
‒ 
For students admitted to any department offering BT program, the proposed name is 
Bachelors in General Engineering 
‒ 
The degree certificate will mention the following. 
(Name of the can

In [19]:
# ask the initial query
user_question = "What courses am I supposed to take in the 3rd semester of the BT programme in biological sciences and bio-engineering"

# starting AgentState
initial_state: AgentState = {
    "question": user_question,
    "docs": [],
    "answer": "",
    "reflection": "",
    "retry": False
}

state = initial_state.copy()

for step_output in app.stream(initial_state):
    if "retrieve" in step_output:
        print("\n==============================")
        print("📚 RETRIEVED DOCUMENTS:\n")
        for i, doc in enumerate(step_output["retrieve"]["docs"], 1):
            print(f"Document {i}:\n{doc}\n")

    elif "generate" in step_output:
        answer = step_output["generate"]["answer"]
        print("\n==============================")
        print("🤖 GENERATED ANSWER:\n")
        print(answer.strip(), "\n")
   
    state.update(step_output)


📚 RETRIEVED DOCUMENTS:

Document 1:
student can take at most 1 UGP per semester (in any order).  
 
Credit Table for BT Program in Biological Sciences and Bioengineering  
Course type 
Recommended Credit range
Credit in the department template
Institute Core (IC) 
112  
112
E/SO 
18-45 
18-20 
Department requirements 
144-179 
161 (98 DC + 63 DE) 
Open electives (OE) 
51-57 
54 
SCHEME 
54-58 
54-58 
Total for 4-year BT/BS 
391-420 
399-405 
 
8.2 
 Template for BTH program in Biological Sciences and Bioengineering

Document 2:
13 
 
BSE642A (09): Microbiology and Immunology 
 
BSE652A (09): Developmental Biology 
 
BSE653A (09): Functional Genomics 
 
BSE654A (09): Human Molecular Genetics 
 
BSE655A (09): Physiology 
 
BSE661A (09): Biological Membranes 
 
— CPI criteria for BTH: 8.5 
 
8.3 
Template for the BTM program in Biological Sciences and Bioengineering 
 
Students have to take 54 credits of course from the Management Track Basket (MTB)

Document 3:
BSE223 (9) 
BSE411 

# INFERENCES
- In the first case when the question used abbreviations and was kinda vague, it failed to retrieve the relevant documents, let alone answer correctly
- But in the second question, with just a slight change to the initial prompt template in the first codeblock and not using abbreviations anymore, it was able to accurately answer the question.
- Also, playing around with MMR vs Similarity search didnt really help.