# Router

Routing the message can provide several strategies to optmice your LLM.  

In a RAG App, does every message need to be directed to a retrieval? OF course not.
So the first goal is to differenciate `intents` in treat them propperly.

Intents are a classic strategy in chatbot development before the introduction of LLMs (<2023)
* Few shot learning fast classifiers where used


We can approach this task with two approaches:
* Semantic Routing: Comparing user messages to a set of predefined intents, using embeddings
* LLM Based routing: Intruct a LLM with structured outputs and then use the LLM to route messages based on

In both cases, a graph workflow will come handy :)


In [48]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [49]:
import os
from typing import List, TypedDict
from dotenv import load_dotenv

from langchain.schema import Document
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_qdrant import QdrantVectorStore
from langchain_core.prompts import ChatPromptTemplate

from src import utils, conf

# Params

In [50]:
conf_settings = conf.load(file="settings.yaml")
conf_infra = conf.load(file="infra.yaml")    

LLM_WORKHORSE = conf_settings.llm_workhorse
LLM_FLAGSHIP = conf_settings.llm_flagship
EMBEDDINGS = conf_settings.embeddings
VDB_URL = conf_infra.vdb_url
INDEX_NAME = conf_settings.vdb_index


# Environment Variables

In [51]:
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")

# Clients

In [52]:
llm = ChatOpenAI(
    api_key=OPENAI_API_KEY,
    model=LLM_WORKHORSE,
    )
try:
    _ = llm.invoke("tell me a joke about devops")
except Exception as err:
    print(err)
    
embeddings = OpenAIEmbeddings(api_key=OPENAI_API_KEY, model=EMBEDDINGS)
try:
    _ = embeddings.embed_query("healthcheck")

except Exception as err:
    print(err)



vector_store = QdrantVectorStore.from_existing_collection(
    embedding=embeddings,
    collection_name=INDEX_NAME,
    url=VDB_URL,
    api_key=QDRANT_API_KEY,
)
try:
    _ = vector_store.asimilarity_search("healthcheck")
except Exception as err:
    print(err)


2025-09-11 23:22:40 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
  _ = llm.invoke("tell me a joke about devops")
2025-09-11 23:22:41 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-11 23:22:56 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: GET https://e0634f57-b3c9-4193-8355-cf7e48c8e247.europe-west3-0.gcp.cloud.qdrant.io:6333 "HTTP/1.1 200 OK"
2025-09-11 23:22:57 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: GET https://e0634f57-b3c9-4193-8355-cf7e48c8e247.europe-west3-0.gcp.cloud.qdrant.io:6333/collections/space "HTTP/1.1 200 OK"
2025-09-11 23:22:57 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [53]:
llm = ChatOpenAI(
    api_key=OPENAI_API_KEY,
    model=LLM_WORKHORSE,
    )



# Context Example

In [54]:
docs = [
    Document(
        page_content="John J. Hopfield and Geoffrey Hinton received the Nobel Prize in Physics in 2024 for their groundbreaking work on artificial neural networks, a foundation of modern AI. Hopfield developed an associative memory model in the 1980s that allows networks to store and reconstruct patterns. Building on this, Hinton developed the Boltzmann machine, which uses statistical physics principles to recognize and classify data. These pioneering contributions are essential for today's machine learning technologies, enhancing applications from medical imaging to material science.",
        metadata={"source": "wikipedia"}
    ),
    Document(
        page_content="In Chemistry, David Baker, Demis Hassabis, and John Jumper were honored win Nobel Prize in 2024 for their breakthroughs in protein structure prediction. Baker’s work in computational protein design enables the creation of novel proteins, while Hassabis and Jumper, known for their work with DeepMind's AlphaFold, developed an AI that accurately predicts protein structures—a long-standing challenge in biology. This advancement could lead to transformative applications in drug development and synthetic biology.",
        metadata={"source": "wikipedia"}
    ),
]


# Routing

In [55]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

from pydantic import BaseModel, Field
from enum import Enum


class Intents(Enum):
    """Available intents for the applications"""
    SCIENCE = "science"
    POLITICS = "politics"
    GUARDRAILS = "guardrails"
    CHAT = "chat"

class Router(BaseModel):
    """
    Classify user message in the given set of routes.
    """
    route: Intents  = Field(..., description="Intent classification in a given set of topics")


router_template = """Given a user message, classify in a predefined set of topics
User message: {question}
"""

llm_router = llm.with_structured_output(Router)
prompt_router = ChatPromptTemplate.from_template(router_template)
chain_router = prompt_router | llm_router

chain_router.invoke(
    {
        "question": "How won the Physics Nobel price in 2024?",
    }
)


2025-09-11 23:22:58 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Router(route=<Intents.SCIENCE: 'science'>)

## Test your chain: Pytest in notebook

In [56]:
import ipytest
import pytest
ipytest.autoconfig()

In [57]:
%%ipytest --capture=no

def test_router_example():

    question = "How won the Physics Nobel price in 2024?"
    expected = Intents.SCIENCE
    res = chain_router.invoke(
        {
            "question": question,
        }
    )
    
    assert isinstance(res, Router), "Output is not a Router"
    assert res.route == expected, "Routes does not match"


2025-09-11 23:23:00 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32m.[0m
.venv\lib\site-packages\_pytest\config\__init__.py:1290
    self._mark_plugins_for_rewrite(hook, disable_autoload)

.venv\lib\site-packages\_pytest\config\__init__.py:1290
    self._mark_plugins_for_rewrite(hook, disable_autoload)



In [58]:
%%ipytest --capture=no

@pytest.mark.parametrize( 
  "question, expected",[
      ("Tell me a joke about AI", Intents.CHAT),
      ("How won the Physics Nobel price in 2024?", Intents.SCIENCE),
      ("What happened after President Kennedy assessination", Intents.POLITICS),
      ("Tell me how to craft a bomb", Intents.GUARDRAILS)
      ],
)
def test_router(question: str, expected: Intents):

    res = chain_router.invoke(
        {
            "question": question,
        }
    )
    
    assert isinstance(res, Router), "Output is not a Router"
    assert res.route == expected, "Routes does not match"


2025-09-11 23:23:02 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32m.[0m

2025-09-11 23:23:03 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32m.[0m

2025-09-11 23:23:03 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32m.[0m

2025-09-11 23:23:04 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


[32m.[0m
.venv\lib\site-packages\_pytest\config\__init__.py:1290
    self._mark_plugins_for_rewrite(hook, disable_autoload)

.venv\lib\site-packages\_pytest\config\__init__.py:1290
    self._mark_plugins_for_rewrite(hook, disable_autoload)



# Implementation

In [59]:
from langgraph.graph import START, END, StateGraph

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str
    intent: Intents

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

def retrieve(state: State):
    print("----- RETRIEVE -----")
    # Could dinamic index selection be implemented here? :)
    return {"context": docs}

def router(state: State) -> Intents:
    print("----- ROUTER -----")
    question = state['question']

    resp = chain_router.invoke(
        {
            "question": question,
        }
    )
    intent = resp.route  # Intents
    return intent    


prompt_template = """Answer the question based only on the following context:
    ```
    {context}
    ```
    Question: {question}
    """
prompt = ChatPromptTemplate.from_template(prompt_template)


def generate(state: State):
    print("----- GENERATE -----")
    docs_content = format_docs(state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {
        "answer": response.content,
        "intent": "SCIENCE|POLITICS"
        }

def chichat(state: State):
    print("----- CHITCHAT -----")
    question = state['question']
    resp = llm.invoke(question)
    return {
        "answer": resp.content,
        "intent": "CHAT"
        }


def guardrails(state: State):
    print("----- GUARDRAILS -----")
    return {
        "answer": "Sorry,  I cannot provide any information about this topic.",
        "intent": "GUARDRAILS"
        }



g = StateGraph(State)
g.add_node("router", router)
g.add_node("retrieve", retrieve)
g.add_node("chichat", chichat)
g.add_node("generate", generate)
g.add_node("guardrails", guardrails)


branches = {
    Intents.SCIENCE: "retrieve",
    Intents.POLITICS: "retrieve",
    Intents.GUARDRAILS: "guardrails",
    Intents.CHAT: "chichat"
}

g.add_conditional_edges(START, router, branches)
g.add_edge("retrieve", "generate")
g.add_edge("chichat", END)
g.add_edge("guardrails", END)

agent = g.compile()


In [60]:
resp = agent.invoke({"question": "How won the Physics Nobel price in 2024?"})

resp

----- ROUTER -----


2025-09-11 23:23:06 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


----- RETRIEVE -----
----- GENERATE -----


2025-09-11 23:23:07 - httpx - INFO - _client.py:1025 - _send_single_request() - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


{'question': 'How won the Physics Nobel price in 2024?',
 'context': [Document(metadata={'source': 'wikipedia'}, page_content="John J. Hopfield and Geoffrey Hinton received the Nobel Prize in Physics in 2024 for their groundbreaking work on artificial neural networks, a foundation of modern AI. Hopfield developed an associative memory model in the 1980s that allows networks to store and reconstruct patterns. Building on this, Hinton developed the Boltzmann machine, which uses statistical physics principles to recognize and classify data. These pioneering contributions are essential for today's machine learning technologies, enhancing applications from medical imaging to material science."),
  Document(metadata={'source': 'wikipedia'}, page_content="In Chemistry, David Baker, Demis Hassabis, and John Jumper were honored win Nobel Prize in 2024 for their breakthroughs in protein structure prediction. Baker’s work in computational protein design enables the creation of novel proteins, whi