This system is inspired by the research paper “Talk to the Right Specialists: Routing and Planning in Multi-agent Systems for Question Answering”, which addresses a key problem in enterprise-level Retrieval-Augmented Generation (RAG) systems: scalability and efficiency.

When companies like Tata or Apple deploy agentic assistants, they deal with large volumes of proprietary internal data such as finance, HR, legal policies, and engineering documents. Running similarity search over the entire knowledge base for every user query quickly becomes computationally expensive and slow.

To solve this, the paper proposes an architecture where the knowledge base is subdivided into logical domains (such as HR, medical, finance, etc.). An LLM-based routing module first classifies the incoming query and dispatches it to the most relevant knowledge base. Once routed, the system performs retrieval and response generation on that specific domain, improving both response accuracy and system latency.

In this project, I implemented a miniature version of this idea for IIT Kanpur. The knowledge base is divided into three domains: academics (which includes grading policies, course structure, and programs like double majors or minors), administration (which includes leaves, reimbursements, and medical insurance), and hostel/mess (which includes hall rules, mess policies, and related logistics).

The query is first passed through an LLM-based router, which semantically classifies the query into one of these three domains. Based on the classification, a domain-specific RAG pipeline is triggered. Each domain has its own vector store (built using Cohere embeddings and ChromaDB) and prompt template, and uses a RetrievalQA chain to fetch relevant information and generate a response.

This architecture is conceptually similar to the HNSW (Hierarchical Navigable Small World) method, but here we use an LLM based classification instead of running similarity searches to narrow the context.

In summary, the idea is to avoid blindly searching across the entire context by semantically understanding the query and sending it to the right “specialist” - in this case, a domain-specific RAG pipeline. This structure-aware approach makes the system more efficient, more scalable, and more accurate than a vanilla RAG that operates over a single large vector index.

In [96]:
!pip install langchain langchain-community cohere chromadb



In [97]:
!pip install pymupdf



In [98]:
from google.colab import userdata
cohere_api_key = userdata.get('COHERE_API_KEY')
groq_api_key = userdata.get('GROQ_API_KEY')

In [99]:
from langchain.document_loaders import DirectoryLoader, PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import CohereEmbeddings
from langchain.vectorstores import Chroma

In [121]:
def build_chroma_db(folder_path, persist_dir, cohere_api_key):
    loader = DirectoryLoader(
        path=folder_path,
        glob="**/*.pdf",
        loader_cls=PyMuPDFLoader,
        show_progress=True
    )
    docs = loader.load()

    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = splitter.split_documents(docs)

    embedding = CohereEmbeddings(cohere_api_key=cohere_api_key,model="embed-english-v3.0",user_agent="xyz")

    vectordb = Chroma.from_documents(
        documents=chunks,
        embedding=embedding,
        persist_directory=persist_dir
    )

    return vectordb

In [101]:
acads_vs = build_chroma_db("Academics", "Academics/academics_db", cohere_api_key)

100%|██████████| 6/6 [00:00<00:00, 36.80it/s]


In [102]:
hostels_vs = build_chroma_db("Hostels", "Hostels/hostels_db", cohere_api_key)

100%|██████████| 2/2 [00:00<00:00, 22.57it/s]


In [103]:
admin_vs = build_chroma_db("Admin", "Admin/admin_db", cohere_api_key)

100%|██████████| 3/3 [00:00<00:00, 55.20it/s]


In [122]:
acad_retriever = acads_vs.as_retriever(search_type="similarity", search_kwargs={"k": 3})
hostel_retriever = hostels_vs.as_retriever(search_type="similarity", search_kwargs={"k": 3})
admin_retriever = admin_vs.as_retriever(search_type="similarity", search_kwargs={"k": 3})

In [105]:
!pip install -q groq langchain_groq

In [123]:
from langchain_groq import ChatGroq

groq_api_key = userdata.get('GROQ_API_KEY')
llm = ChatGroq(groq_api_key=groq_api_key,
    model_name="llama3-8b-8192")

In [107]:
from langchain_core.prompts import PromptTemplate

In [124]:
acad_prompt = PromptTemplate(
    template="""
      You are a helpful AI assistant who is filling in place for the office of academic enquiries, IIT Kanpur.
      Answer the questions only based on context given to you.
      If the answer is not found verbatim in the context, ONLY output: I am sorry, the information you are asking for is unavailable at the moment.

      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

hostels_prompt = PromptTemplate(
    template="""
      You are a helpful AI assistant who is filling in place for the office of student hall affairs, IIT Kanpur.
      Answer the questions only based on context given to you.
      If the answer is not found verbatim in the context, ONLY output: I am sorry, the information you are asking for is unavailable at the moment.
      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

admin_prompt = PromptTemplate(
    template="""
      You are a helpful AI assistant who is filling in place for the admin block of IIT Kanpur.
      Answer the questions only based on context given to you.
      If the answer is not found verbatim in the context, ONLY output: I am sorry, the information you are asking for is unavailable at the moment.


      {context}
      Question: {question}
    """,
    input_variables = ['context', 'question']
)

In [125]:
from langchain.chains import RetrievalQA

acad_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=acad_retriever,
    chain_type_kwargs={
        "prompt": acad_prompt
    }
)

hostel_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=hostel_retriever,
    chain_type_kwargs={
        "prompt": hostels_prompt
    }
)

admin_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=admin_retriever,
    chain_type_kwargs={
        "prompt": admin_prompt
    }
)

In [132]:
from langchain.chains import LLMChain

router_prompt = PromptTemplate(
    template="""
You are an intelligent routing assistant for IIT Kanpur.

Your job is to classify the following user query into one of the three domains:
- academic
- hostel
- admin

If the query is related to courses, grading policy, attendance, double major or minor programmes, return "academic"
If the query is related to hostel, halls of residence, mess policies, return "hostel"
If the query is related to medical insurance, scholarships, leave rules or reimbursement, return "admin"

I want you to return a single word. Cut the fluff, only answer whether the query belongs to academic, hostel, or admin departments as shown below.

Examples:
Q: What is the double major program?
A: academic

Q: How do I apply for a mess rebate?
A: hostel

Q: Where can I get medical reimbursement details?
A: admin

Now classify the following query.

Query: {query}
Answer:
""",
    input_variables=["query"]
)

router_chain = LLMChain(
    llm=llm,
    prompt=router_prompt
)

In [129]:
route = router_chain.predict(query=query)
route.strip().lower()

'admin'

In [137]:
def iitk_router(query: str) -> str:

    route = router_chain.predict(query=query).strip().lower()

    if "academic" in route:
        return acad_chain.invoke({"query": query})["result"]
    if "hostel" in route:
        return hostel_chain.invoke({"query": query})["result"]
    if "admin" in route:
        return admin_chain.invoke({"query": query})["result"]

    # return "Sorry, I couldn't classify your query."

In [138]:
print(iitk_router("Tell me about medical insurance"))

You're looking for information on medical insurance! According to the provided FAQ, I can tell you that:

The Pan India Cashless Medical Insurance Scheme (PICMIS) is a medical insurance scheme offered by the Institute to provide medical reimbursement towards hospitalization treatment (IPD) expenses. 

And, according to the FAQ, the following are covered under PICMIS:

(I apologize, but the answer is not provided in the given context, so I don't have the information on who is covered under PICMIS. If you need more information, I can try to help you find it!)
