# Lab 3 – Simulating a Mixture‑of‑Experts Router

Route queries to specialized ‘experts’ using LangChain’s `RouterChain`.

## Environment Setup
Set the following environment variables before running this lab:

| Variable | Purpose |
|----------|---------|
| `OPENAI_API_KEY` | Enables OpenAI models used by LangChain (`ChatOpenAI`). |

Local:
```bash
export OPENAI_API_KEY="sk-..."
```
Colab:
```python
import os
os.environ['OPENAI_API_KEY'] = 'sk-...'
```
⚠️ **Never expose keys publicly.**

In [None]:
import os
if not os.getenv('OPENAI_API_KEY'):
    raise ValueError('OPENAI_API_KEY is not set.')
print('OPENAI key loaded.')

In [None]:
!pip -q install langchain --upgrade matplotlib

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMRouterChain
from langchain.output_parsers import RouterOutputParser

llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)

prompt_animals = ChatPromptTemplate.from_messages([
    ('system', 'You are an expert zoologist.'),
    ('user', '{query}')
])

prompt_plants = ChatPromptTemplate.from_messages([
    ('system', 'You are a botany professor.'),
    ('user', '{query}')
])

router_prompt = 'Return ANIMALS if the question is about animals, else PLANTS.'
router = LLMRouterChain.from_llm(
    llm=llm,
    prompt=router_prompt,
    output_parser=RouterOutputParser(choices=['ANIMALS','PLANTS'])
)

query = 'How do elephants regulate body temperature?'
branch = router.run(query=query)
chain = (prompt_animals if branch=='ANIMALS' else prompt_plants) | llm
print(chain.invoke({'query': query}))

### ✏️ Exercises
1. Add a third expert on *geography* and update the router.
2. Measure latency of routing vs. a single dense model call.
3. Discuss how this relates to sparse MoE layers in Mixtral.

## 2. Simulated MoE Router Demo
Following the slides, we'll build a router that picks between two experts (`animals`, `plants`) and optionally a third (`geography`).

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMRouterChain
from langchain.output_parsers import RouterOutputParser

llm = ChatOpenAI(model='gpt-4o-mini', temperature=0)

prompt_animals = ChatPromptTemplate.from_messages([
    ('system','You are an expert zoologist.'),
    ('user','{query}')
])
prompt_plants = ChatPromptTemplate.from_messages([
    ('system','You are an expert botanist.'),
    ('user','{query}')
])
prompt_geo = ChatPromptTemplate.from_messages([
    ('system','You are an expert geographer.'),
    ('user','{query}')
])

router_prompt = (
    'Classify the user question into one of: ANIMALS, PLANTS, GEO.\n'
    'Respond with the class name only.'
)
router = LLMRouterChain.from_llm(
    llm=ChatOpenAI(model='gpt-3.5-turbo', temperature=0),
    prompt=router_prompt,
    output_parser=RouterOutputParser(choices=['ANIMALS','PLANTS','GEO'])
)

def router_call(query):
    topic = router.run(query)
    if topic=='ANIMALS':
        chain = prompt_animals | llm
    elif topic=='PLANTS':
        chain = prompt_plants | llm
    else:
        chain = prompt_geo | llm
    return topic, chain.invoke({'query':query})

sample_qs = ['Why do dogs bark?', 'What causes leaf chlorosis?', 'What is the tallest mountain in Africa?']
for q in sample_qs:
    t, ans = router_call(q)
    print(f'[{t}] {q} -> {ans[:80]}...\n')

## 3. Latency Benchmark vs Dense Model

In [None]:
import time
dense_llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)

def dense_call(q):
    return dense_llm.invoke(q)

q='Explain photosynthesis in simple terms.'
t0=time.perf_counter(); _=dense_call(q); t_dense=time.perf_counter()-t0
t0=time.perf_counter(); _=router_call(q); t_router=time.perf_counter()-t0
print(f'Dense latency: {t_dense:.2f}s, Router: {t_router:.2f}s')

## 4. Router Load‑Balancing Visualization

In [None]:
import matplotlib.pyplot as plt
queries = [
    'Tell me about the Amazon rainforest',
    'Describe a tiger\'s diet',
    'How high is Mount Everest?',
    'Why are roses red?',
    'Explain canine behavior',
    'What is the capital of Japan?'
]
counts={'ANIMALS':0,'PLANTS':0,'GEO':0}
for q in queries:
    t,_=router_call(q)
    counts[t]+=1

plt.bar(counts.keys(), counts.values())
plt.title('Router Expert Utilization')
plt.ylabel('# Queries')
plt.show()

## ✏️ Exercises (Lab 3)
1. **Add a Coding Expert** – Create a `coding` expert system prompt and modify the router to classify DEV questions.
2. **Latency Profiling** – Measure latency for 20 diverse queries; plot boxplots comparing dense vs router.
3. **Load‑Balancing Tuning** – Add noise to router softmax (temperature) and observe expert distribution changes.
4. **Router Robustness Test** – Craft a prompt‑injection attempt to force the router to choose the wrong expert; patch the router prompt to defend.
5. **Cost Estimation** – Use `tiktoken` or similar to approximate token usage difference between dense and router setups.