# Router architecture
This architecture represents a step up in autonomy by assigning **decision-making responsibility** to the LLM itself.  
Unlike the **chain architecture**, which always follows a static, predefined sequence of steps, the **router architecture** enables the LLM to **dynamically choose** the next action from a set of predefined options.

---

#### Example: RAG Application with Multiple Indexes

Consider a Retrieval-Augmented Generation (RAG) system that accesses **multiple document indexes**, each related to a different domain (see Chapter 2 for more on indexing).

To maximize performance, it’s important to **avoid including irrelevant information** in the prompt.  
This means selecting the **most appropriate index** for each incoming query.

---

### Key Idea

Instead of hardcoding logic to choose an index, we use an **LLM to route the request**:

- The LLM evaluates the user’s query.
- Based on that evaluation, it selects the most relevant index from the available options.
- Only the chosen index is queried to retrieve documents for generating the final answer.

---

This router approach allows applications to be more **adaptive** and **efficient**, tailoring their behavior in real time based on the user’s input.

### Router Architecture Flow: Step-by-Step

Let’s break down the process in plain terms:

1. **Index Selection via LLM**  
   An LLM evaluates the user’s query along with descriptions of the available indexes (provided by the developer)  
   → **It selects the most appropriate index** to use.

2. **Document Retrieval**  
   The system queries the **selected index** to find the most relevant documents for the given user query.

3. **Answer Generation via LLM**  
   Another LLM call takes the original user query and the retrieved documents  
   → **It generates a final answer** based on the combined information.

---

### Implementation with LangGraph

Now that the flow is clear, the next step is to translate this into an implementation using **LangGraph**.


In [2]:
from typing import Annotated, Literal, TypedDict

from langchain_core.documents import Document
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.vectorstores.in_memory import InMemoryVectorStore
from langchain_ollama import OllamaEmbeddings
from langchain_openai import AzureChatOpenAI

from langgraph.graph import END, START, StateGraph
from langgraph.graph.message import add_messages

embeddings = OllamaEmbeddings(model="mxbai-embed-large")
# useful to generate SQL query
model_low_temp = AzureChatOpenAI(azure_deployment="gpt-4o", api_version="2024-10-21", temperature=0.1, model="gpt-4o")
# useful to generate natural language outputs
model_high_temp = AzureChatOpenAI(azure_deployment="gpt-4o", api_version="2024-10-21", temperature=0.7, model="gpt-4o")

class State(TypedDict):
    # to track conversation history
    messages: Annotated[list, add_messages]
    # input
    user_query: str
    # output
    domain: Literal["records", "insurance"]
    documents: list[Document]
    answer: str

class Input(TypedDict):
    user_query: str

class Output(TypedDict):
    documents: list[Document]
    answer: str

# refer to RAG full example on how to fill a vector store with documents
medical_records_store = InMemoryVectorStore.from_documents([], embeddings)
medical_records_retriever = medical_records_store.as_retriever()

insurance_faqs_store = InMemoryVectorStore.from_documents([], embeddings)
insurance_faqs_retriever = insurance_faqs_store.as_retriever()

router_prompt = SystemMessage(
    """You need to decide which domain to route the user query to. You have two 
        domains to choose from:
          - records: contains medical records of the patient, such as 
          diagnosis, treatment, and prescriptions.
          - insurance: contains frequently asked questions about insurance 
          policies, claims, and coverage.

OUTPUT ONLY THE DOMAIN NAME."""
)

def router_node(state: State) -> State:
    user_message = HumanMessage(state["user_query"])
    messages = [router_prompt, *state["messages"], user_message]
    res = model_low_temp.invoke(messages)
    return {
        "domain": res.content,
        # update conversation history
        "messages": [user_message, res],
    }

def pick_retriever(
    state: State,
) -> Literal["retrieve_medical_records", "retrieve_insurance_faqs"]:
    if state["domain"] == "records":
        return "retrieve_medical_records"
    else:
        return "retrieve_insurance_faqs"

def retrieve_medical_records(state: State) -> State:
    documents = medical_records_retriever.invoke(state["user_query"])
    return {
        "documents": documents,
    }

def retrieve_insurance_faqs(state: State) -> State:
    documents = insurance_faqs_retriever.invoke(state["user_query"])
    return {
        "documents": documents,
    }

medical_records_prompt = SystemMessage(
    """You are a helpful medical chatbot who answers questions based on the 
        patient's medical records, such as diagnosis, treatment, and 
        prescriptions."""
)

insurance_faqs_prompt = SystemMessage(
    """You are a helpful medical insurance chatbot who answers frequently asked 
        questions about insurance policies, claims, and coverage."""
)

def generate_answer(state: State) -> State:
    if state["domain"] == "records":
        prompt = medical_records_prompt
    else:
        prompt = insurance_faqs_prompt
    messages = [
        prompt,
        *state["messages"],
        HumanMessage(f"Documentd: {state["documents"]}"),
    ]
    res = model_high_temp.invoke(messages)
    return {
        "answer": res.content,
        # update conversation history
        "messages": res,
    }

builder = StateGraph(State, input=Input, output=Output)
builder.add_node("router", router_node)
builder.add_node("retrieve_medical_records", retrieve_medical_records)
builder.add_node("retrieve_insurance_faqs", retrieve_insurance_faqs)
builder.add_node("generate_answer", generate_answer)
builder.add_edge(START, "router")
builder.add_conditional_edges("router",pick_retriever) # conditional edges as the next node depends on the output of the router
builder.add_edge("retrieve_medical_records", "generate_answer")
builder.add_edge("retrieve_insurance_faqs", "generate_answer")
builder.add_edge("generate_answer", END)

graph = builder.compile()

Show the graph

In [3]:
graph.get_graph().draw_mermaid_png(output_file_path="../img/router.png")

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\xdc\x00\x00\x01\xb0\x08\x02\x00\x00\x00\x8c\xf7C\x83\x00\x00\x10\x00IDATx\x9c\xec\xdd\x07\\\x13\xe7\xff\x07\xf0\'$\x10 \xec-\n(nq\x80[l\xeb\xd6V\x11\xf7\xde\xb5\xd6\xbam\xdd{\xa0m\xdd{\xb5Z\xadZ\xab\xb5\xee\xbd\xf7\xa8\x13\x14\x07\x8a\x02*K\xf6H\xc2H\xe0\xff\x85\xeb?\xe5\xa7\x10\x15\x12\xbd$\x9f\xb7\xbex\x1d\x97\xe4\xb8$\xcf}\xee\xb9\xefs\xb9\x88rss\x19\x00\x00\xf0\x83\x88\x01\x00\x00o \x94\x01\x00x\x04\xa1\x0c\x00\xc0#\x08e\x00\x00\x1eA(\x03\x00\xf0\x08B\x19\x00\x80G\x10\xca\xa0u\xf1QY\xe9\xc9\ni\xaa"+#\'K\x9e\xc3xOh,\x10\x89\x04\xe6VB\x89\xb5\xc8\xbe\x94\xd8\xd4\xdc\x88\x01|,\x02\x9c\xa7\x0cZ\x12\xfeP\xfa\xfc\xbe4\xec\x81\xd4\xad\x92y\xa6\\)\xb1\x12\xd9:\x99(\xb2u \x94\x8d\xc5F\xdc^D\x9a\xaa\xccHW\x9a\x98\x19\x95\xf3\x92T\xf2\xb1\xb0\xb43f\x00Z\x86P\x06\xcd\x0b\x7f \xbdr8\xa1TYSg\x0fS\x8a3sK!\xd3e\xd1\xe1\x19\xf4\x8c\x12\xa3\xb3$6"_?{\x13St\x9cA\x8b\x10\xca\xa0I\xd4\x9aNn\x8b\xc9\xce\xca\xf1\xf5s\xb0s1a\xfa%\xf8j\xca\x95C\t\x8d\xda\xd

![Router LLM](../img/router.png)

### Understanding Conditional Paths in the Graph

At this stage, the graph becomes significantly more insightful. It now reflects the **two possible execution paths**:

- One path flows through `retrieve_medical_records`
- The other flows through `retrieve_insurance_faqs`

Both paths:

- **Start at the router node**, which decides the path based on the user’s input.
- **End at the generate_answer node**, which produces the final response.

---

### How the Routing Works

The branching is controlled by a **conditional edge**, defined in the `pick_retriever` function.  
This function maps the domain selected by the LLM to the appropriate node (`retrieve_medical_records` or `retrieve_insurance_faqs`).

> In **Figure 5-4**, these conditional edges are depicted as **dotted lines** connecting the source node (router) to the destination nodes.

---

### Example Inputs and Outputs (with Streaming)

Next, let’s look at example inputs and outputs—this time demonstrating **streaming output**.


In [4]:
input = {
    "user_query": "Am I covered for COVID-19 treatment?"
}
for c in graph.stream(input):
    print(c)

{'router': {'domain': 'insurance', 'messages': [HumanMessage(content='Am I covered for COVID-19 treatment?', additional_kwargs={}, response_metadata={}, id='783a2c61-201d-4fcc-8603-bda83dab9143'), AIMessage(content='insurance', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 2, 'prompt_tokens': 90, 'total_tokens': 92, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-11-20', 'system_fingerprint': 'fp_ee1d74bde0', 'id': 'chatcmpl-BZnoeyMcXT3p2QIPrSToQyzIIiLkL', 'service_tier': None, 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'f

### Understanding the Output Stream

The output stream reflects the values returned by each node executed during the graph’s run. Let’s break it down step by step:

1. **Router Node**  
   - Returned an update to the `messages` key.  
   - Identified the domain relevant to the user’s query—in this case, **"insurance"**.  
   - This enables continuation of the conversation using memory, as previously discussed.

2. **pick_retriever Function**  
   - Executed next.  
   - Based on the selected domain, it returned the name of the next node to run.

3. **retrieve_insurance_faqs Node**  
   - Fetched relevant documents from the **insurance FAQs index**.  
   - Indicates that the graph followed the **left path**, as determined by the LLM’s domain selection.

4. **generate_answer Node**  
   - Took the retrieved documents and the original query as input.  
   - Generated a final answer to the user’s question.  
   - Also updated the `messages` key in the state with this final interaction.


Let's go back to the [main file](../README.md).