# Semantic Routing

## 📘 前言

在建立 **RAG（Retrieval-Augmented Generation）** 系統時，我們常需要根據使用者的查詢（query）從龐大的資料中找到最相關的內容。然而，使用者的查詢往往**隱含主題**，並不會明確指出要查哪一類的資訊。  
這時，若直接使用語義相似度（semantic similarity）做檢索，可能會得到許多主題不同但語意相近的結果，導致生成答案的品質下降。

---

## 🐱 範例說明

假設使用者輸入：

> 「貓尾巴澎起來的原因」

直覺上，我們知道這是一個關於「貓」的問題。  
但若僅依靠語義相似度檢索，可能同時撈出與「尾巴」或「澎起來」相關、但主題是「狗」或「人」的資料。  
這會降低答案的準確性。

---

## ⚙️ 傳統作法：LLM 主題抽取（LLM-based Routing）

一種穩定的作法是讓 **LLM 先判斷查詢的主題**，再依照該主題限制檢索範圍。  
例如：

1. 使用 LLM 分析 query → 得出主題：「貓」  
2. 在資料庫中篩選出主題為「貓」的內容  
3. 再進行相似度檢索與答案生成  

這種方式的優點是準確且穩定，但缺點是：
- 成本高（每次都需呼叫 LLM）
- 延遲較長（增加反應時間）

---

## ⚡ 替代方案：Semantic Routing

**Semantic Routing** 是一種不依賴 LLM、只使用 **embedding 向量** 來進行查詢導向（routing）的技術。

### 🔍 基本概念

1. 為每個主題（例如「貓」、「狗」、「鳥」）建立代表性的向量（embedding）。  
2. 將使用者的 query 轉換成向量。  
3. 計算 query 與各主題向量的語義相似度。  
4. 將 query 導向最相似的主題資料庫或檢索管線。

### 🧠 運作流程示意

User Query → 向量化 → 比對各主題向量

↓

判定主題：「貓」

↓

導向「貓」資料庫檢索

---

## ⚖️ 優缺點比較

| 項目 | LLM-based Routing | Semantic Routing |
|------|------------------|------------------|
| 方法 | 使用 LLM 判斷主題 | 使用 embedding 相似度判斷主題 |
| 成本 | 高（需 LLM 呼叫） | 低（僅向量運算） |
| 速度 | 慢 | 快 |
| 準確度 | 穩定且準確 | 稍低，可能誤導 |
| 適用場景 | 關鍵任務、需高準確度 | 快速路由、大量查詢場景 |

---

## 🧩 實務建議

- 若系統對**準確度要求高**，建議採用 **LLM-based routing** 或混合策略（先 semantic routing，低信心時再 fallback 到 LLM）。  
- 若系統需**快速反應、大量處理查詢**，可以優先考慮 **Semantic Routing**。  
- 可結合主題向量的動態更新機制，使路由決策更貼近實際語意空間變化。

---

## 🚀 總結

> **Semantic Routing** 是在語義空間中，利用向量相似度決定查詢路徑的一種輕量解法。  
> 它能在不呼叫 LLM 的情況下快速進行主題導向檢索，  
> 是在成本、速度與穩定性之間取得平衡的重要技術。

模板來源
- https://medium.com/@shriyansnaik/training-semantic-router-on-custom-data-99667c4e77ca

## 先決條件:
- 安裝 CMake
- 安裝 visual studio:
    - https://stackoverflow.com/questions/40504552/how-to-install-visual-c-build-tools
    - winget install Microsoft.VisualStudio.2022.BuildTools --force --override "--wait --passive --add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --add Microsoft.VisualStudio.Component.Windows11SDK.22621

In [None]:
!pip install langchain_openai semantic-router>=0.1.6 semantic-router[local]

其實是不需要 langchain_openai。這跟src.initialization模板的設計有關

In [None]:
import os

os.chdir("../../../")

In [None]:
from collections import defaultdict

import numpy as np
from semantic_router import Route
from semantic_router.encoders import HuggingFaceEncoder
from semantic_router.routers import SemanticRouter

from src.initialization import credential_init


credential_init()

mathematics_route = Route(
    name="mathematics",
    # score_threshold = 0.3  This value can be set to any number between 0 and 1
    utterances=[
        "How to multiply 2 numbers?",
        "Can you solve 3x + 4 = 10?",
        "What is Pythagoras' theorem?",
    ],
)
biology_route = Route(
    name="biology",
    utterances=[
        "What is the function of the mitochondria?",
        "What is DNA replication?",
        "What are the stages of cell division?",
    ],
)
physics_route = Route(
    name="physics",
    utterances=[
        "What is Newton's first law of motion?",
        "How does a prism separate light?",
        "What is quantum entanglement?",
    ],
)
chitchat_route = Route(
    name="chitchat",
    utterances=[
        "What's your favorite movie?",
        "Have you been to any good restaurants lately?",
        "What kind of music do you listen to?",
    ],
)

routes = [mathematics_route, biology_route, physics_route, chitchat_route]

In [None]:
test_data = [
    ("What is the formula for calculating velocity?", "physics"),
    ("Can you explain photosynthesis?", "biology"),
    ("Derivative of 3x+5?", "mathematics"),
    ("Have you been to any good restaurants lately?", "chitchat"),
    ("What is the capital of France?", None),
    ("How does a prism separate light?", "physics"),
    ("What are the stages of cell division?", "biology"),
    ("Integrate 6x+10?", "mathematics"),
    ("Do you like pizza or burgers?", "chitchat"),
    ("Can you help me with my computer?", None)
]

## 定義模型

In [None]:
encoder = HuggingFaceEncoder(name="BAAI/bge-m3", 
                             huggingface_api_key=os.environ['HuggingFace_API_KEY'])

semantic_router = SemanticRouter(encoder=encoder, routes=routes, auto_sync="local")

### 分類

In [None]:
semantic_router("What is the formula for calculating velocity?")

In [None]:
## 使用 limit控制返回的類別數量

semantic_router("What is the formula for calculating velocity?", limit=2)

### 衡量模型表現

In [None]:
X_eval, y_eval = zip(*test_data)
accuracy = semantic_router.evaluate(X=X_eval, y=y_eval)
print(f'Accuracy: {accuracy}')

## 訓練用數據

訓練用數據主要適用於微調 threshold

In [None]:
semantic_router.get_thresholds()

In [None]:
training_data = [
    # Mathematics
    ("What is 2 plus 2?", "mathematics"),
    ("Can you solve 3x + 4 = 10?", "mathematics"),
    ("What is the square root of 16?", "mathematics"),
    ("How do you find the area of a circle?", "mathematics"),
    ("What is Pythagoras' theorem?", "mathematics"),
    ("Solve for x: 2x - 5 = 7", "mathematics"),
    ("What is 7 multiplied by 8?", "mathematics"),
    ("Explain the quadratic formula.", "mathematics"),
    ("What is the value of pi?", "mathematics"),
    ("How do you calculate the volume of a cylinder?", "mathematics"),
    # Biology
    ("What is the function of the mitochondria?", "biology"),
    ("Can you explain photosynthesis?", "biology"),
    ("What are the stages of cell division?", "biology"),
    ("What is DNA replication?", "biology"),
    ("How do plants make their food?", "biology"),
    ("What is cellular respiration?", "biology"),
    ("Can you describe the process of evolution?", "biology"),
    ("What are enzymes and what do they do?", "biology"),
    ("What is the difference between prokaryotic and eukaryotic cells?", "biology"),
    ("How do vaccines work?", "biology"),
    # Physics
    ("What is Newton's first law of motion?", "physics"),
    ("How does a prism separate light?", "physics"),
    ("What is the formula for calculating velocity?", "physics"),
    ("Can you explain the theory of relativity?", "physics"),
    ("What is quantum entanglement?", "physics"),
    ("Explain the concept of inertia.", "physics"),
    ("What is the speed of light?", "physics"),
    ("How does a pendulum work?", "physics"),
    ("What is gravitational force?", "physics"),
    ("Describe the laws of thermodynamics.", "physics"),
    # Chitchat
    ("What's your favorite movie?", "chitchat"),
    ("Do you like pizza or burgers?", "chitchat"),
    ("Have you been to any good restaurants lately?", "chitchat"),
    ("What kind of music do you listen to?", "chitchat"),
    ("Tell me a joke!", "chitchat"),
    ("What's your favorite book?", "chitchat"),
    ("Do you have any hobbies?", "chitchat"),
    ("What's the most interesting place you've visited?", "chitchat"),
    ("What did you do over the weekend?", "chitchat"),
    ("Have you seen any good movies recently?", "chitchat"),
    # None
    ("What time is the meeting scheduled for?", None),
    ("Who all are invited to the party?", None)
]

In [None]:
X_train, y_train = zip(*training_data)

semantic_router.fit(X=X_train, y=y_train)

In [None]:
semantic_router.get_thresholds()

## Update semantic_router

In [None]:
thresholds = semantic_router.get_thresholds()

In [None]:
semantic_router.routes

In [None]:
semantic_router.routes[0].utterances

In [None]:
utternace_updated = defaultdict(list)

# The utterances in original router
for route in semantic_router.routes:
    utternace_updated[route.name] = route.utterances

# Append training data into the new utterances
for row in training_data:
    if row[1] in utternace_updated:
        utternace_updated[row[1]].append(row[0])

In [None]:
routes = [Route(name=category, utterances=utterances, score_=thresholds[category]) for category, utterances in utternace_updated.items()]

In [None]:
semantic_router = SemanticRouter(encoder=encoder, routes=routes, auto_sync="local")

In [None]:
accuracy = semantic_router.evaluate(X=X_eval, y=y_eval)
print(f'Accuracy: {accuracy}')

# Hybrid Router

除了 semantic encoder, 我們加入 sparse encoder 來抓取特定關鍵字。因為語意不會因為特定的物品被置換而有太大的改變，但你很有可能會因此無法得到你要的內容。
可以參考第一周的Ensemble Retriever。背後的原理是一樣的。 

Sparse Encoder (BM25之類的演算法) 可能對於物品和品牌之類的搜索比較有用，因為這像是在抓取關鍵字

This is the place where you need the Aurelio API Key
Here is a $5 promotion code: JBCODEAGENT

In [None]:
from semantic_router.encoders.aurelio import AurelioSparseEncoder
from semantic_router.routers import HybridRouter

mathematics_route = Route(
    name="mathematics",
    # score_threshold = 0.3  This value can be set to any number between 0 and 1
    utterances=[
        "How to multiply 2 numbers?",
        "Can you solve 3x + 4 = 10?",
        "What is Pythagoras' theorem?",
    ],
)
biology_route = Route(
    name="biology",
    utterances=[
        "What is the function of the mitochondria?",
        "What is DNA replication?",
        "What are the stages of cell division?",
    ],
)
physics_route = Route(
    name="physics",
    utterances=[
        "What is Newton's first law of motion?",
        "How does a prism separate light?",
        "What is quantum entanglement?",
    ],
)
chitchat_route = Route(
    name="chitchat",
    utterances=[
        "What's your favorite movie?",
        "Have you been to any good restaurants lately?",
        "What kind of music do you listen to?",
    ],
)

routes = [mathematics_route, biology_route, physics_route, chitchat_route]

sparse_encoder = AurelioSparseEncoder(name="bm25")

hybrid_router = HybridRouter(
    encoder=encoder, sparse_encoder=sparse_encoder, routes=routes, auto_sync="local"
)

In [None]:
accuracy = hybrid_router.evaluate(X=X_eval, y=y_eval)
print(f'Accuracy: {accuracy}')

In [None]:
hybrid_router.fit(X=X_train, y=y_train)

In [None]:
hybrid_router.get_thresholds()

In [None]:
routes = [Route(name=category, utterances=utterances, score_=thresholds[category]) for category, utterances in utternace_updated.items()]

In [None]:
hybrid_router = HybridRouter(
    encoder=encoder, sparse_encoder=sparse_encoder, routes=routes, auto_sync="local"
)

In [None]:
accuracy = hybrid_router.evaluate(X=X_eval, y=y_eval)
print(f'Accuracy: {accuracy}')