# Lab 5 — ExpertRAG End‑to‑End Demo
Combine **MoE‑style expert routing**, **retrieval‑augmented generation**, and **guardrails** into one pipeline.

In [None]:
!pip -q install langchain sentence-transformers faiss-cpu openai guardrails-ai ragas matplotlib tqdm
import warnings, os; warnings.filterwarnings('ignore')

## Environment Setup
| Variable | Purpose |
|----------|---------|
| `OPENAI_API_KEY` | LLM calls via LangChain/OpenAI |
| `PINECONE_API_KEY`, `PINECONE_ENV` | (Optional) Use Pinecone instead of local FAISS |

Set locally:
```bash
export OPENAI_API_KEY="sk-..."
```
Or in Colab:
```python
import os
os.environ['OPENAI_API_KEY']='sk-...'
```

In [None]:
import os, openai
if not os.getenv('OPENAI_API_KEY'):
    raise ValueError('OPENAI_API_KEY missing')
openai.api_key = os.environ['OPENAI_API_KEY']
print('OpenAI key loaded.')

## 1. Prepare Domain Corpora
For demo purposes we’ll create two tiny domain‑specific corpora (medical & finance). Replace with your own documents in production.

In [None]:
medical_docs = [
    'Hypertension is a chronic medical condition in which the blood pressure in the arteries is persistently elevated.',
    'The normal resting heart rate for adults ranges from 60 to 100 beats per minute.'
]
finance_docs = [
    'The S&P 500 is a stock market index tracking the stock performance of 500 large companies listed on exchanges in the United States.',
    'Compound interest is the addition of interest to the principal sum of a loan or deposit.'
]
print('Loaded corpora: medical', len(medical_docs), '| finance', len(finance_docs))

## 2. Embed & Index with FAISS

In [None]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
embed = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

store_med = FAISS.from_texts(medical_docs, embed)
store_fin = FAISS.from_texts(finance_docs, embed)
retrievers = {'MEDICAL': store_med.as_retriever(search_kwargs={'k':3}),
             'FINANCE': store_fin.as_retriever(search_kwargs={'k':3})}

## 3. Define Experts & Router

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMRouterChain
from langchain.output_parsers import RouterOutputParser

llm_expert = ChatOpenAI(model='gpt-4o-mini', temperature=0)

prompt_med = ChatPromptTemplate.from_messages([
    ('system','You are a medical expert. Answer medically.'),
    ('user','{query}')
])
prompt_fin = ChatPromptTemplate.from_messages([
    ('system','You are a finance expert. Answer with financial knowledge.'),
    ('user','{query}')
])

router_prompt = (
    'Classify the user question into one of the following domains: MEDICAL, FINANCE.\n'
    'If unsure choose FINANCE. Respond with the class name only.'
)
router = LLMRouterChain.from_llm(
    llm=ChatOpenAI(model='gpt-3.5-turbo', temperature=0),
    prompt=router_prompt,
    output_parser=RouterOutputParser(choices=['MEDICAL','FINANCE'])
)

### 3.1 Conditional Retrieval Decision
The router also decides whether to `RETRIEVE` based on complexity keywords.

In [None]:
def need_retrieval(query):
    trigger = any(k in query.lower() for k in ['define','explain','what is','when','history'])
    return trigger

## 4. Guardrails
Use a toxicity validator and a simple grounding checker.

In [None]:
from guardrails import Guard, OnFailAction
from guardrails.hub import ToxicLanguage

guard_toxic = Guard().use(ToxicLanguage(threshold=0.7, validation_method='sentence', on_fail=OnFailAction.EXCEPTION))

def grounding_checker(answer, retrieved):
    missing = []
    for sent in answer.split('. '):
        if sent and not any(sent.strip().lower() in d.lower() for d in retrieved):
            missing.append(sent)
    return missing

## 5. ExpertRAG Pipeline Function

In [None]:
def expert_rag(query):
    domain = router.run(query)
    retrieve = need_retrieval(query)
    context_chunks = []
    if retrieve:
        context_docs = retrievers[domain].get_relevant_documents(query)
        context_chunks = [d.page_content for d in context_docs]
        context = '\n'.join(context_chunks)
        prompt = ChatPromptTemplate.from_messages([
            ('system', f'You are a {domain.lower()} expert. Use the provided CONTEXT to answer.\nCONTEXT:\n{context}'),
            ('user', query)
        ])
    else:
        prompt = prompt_med if domain=='MEDICAL' else prompt_fin
    answer = llm_expert.invoke(prompt.format(query=query))
    # Guardrails
    guard_toxic.validate(answer)
    if retrieve:
        missing = grounding_checker(answer, context_chunks)
        if missing:
            raise ValueError('Answer not fully grounded:', missing)
    return domain, retrieve, answer

## 6. Demo Queries

In [None]:
queries = [
    'What is hypertension?',
    'Explain compound interest.',
    'Give me health advice about heart rate.',
    'Define EBITDA.'
]
for q in queries:
    d, r, a = expert_rag(q)
    print(f'[{d} | retrieve={r}] {q}\n{a}\n')

## ✏️ Exercises (Lab 5)
1. **Add a Tech Expert** – create a technology corpus, index it, extend router choices, and test.
2. **Retrieval Decision Tuning** – swap `need_retrieval` heuristic with an LLM‑based confidence score.
3. **Expert‑Level Guardrails** – apply a stricter validator only for medical answers (e.g., block medical advice).
4. **Latency & Cost Measurement** – profile ExpertRAG vs dense GPT‑4 answer for 20 mixed queries.
5. **Grounding Metric** – integrate `ragas` faithfulness metric to auto‑grade each ExpertRAG response.