# 01 - Agentic Routing Engine: End-to-End Pipeline

This notebook demonstrates the **full agent system** from user query to final response.

## Architecture

```
User Message
     │
     ▼
┌┐
│  1. MEMORY RECALL (ST turns + LT facts)          │
│     → Recent conversation + distilled knowledge   │
└┬─┘
                       │
                       ▼
┌┐
│  2. ROUTER (LLM intent classification)           │
│     → crm | rag | web_search | direct            │
└┬─┘
                       │
          ┌┼┐
          ▼            ▼            ▼
     ┌─┐ ┌─┐ ┌─┐
     │   CRM   │ │   RAG   │ │ Web Search│
     │  Tool   │ │  Tool   │ │   Tool    │
     └┬┘ └┬┘ └─┬─┘
          │           │            │
          └─┼┘
                      ▼
┌┐
│  4. SYNTHESISER (LLM merges memory + tool output)│
│     → Natural, personalised response              │
└┬─┘
                       │
                       ▼
┌┐
│  5. MEMORY STORE + DISTILL                       │
│     → Save turn to ST · Extract LT facts          │
└┘
```

## What You'll Learn
1. How the **router** classifies user intent with JSON-structured LLM output
2. How **3 tools** (CRM, RAG, Web Search) are dispatched based on the route
3. How **memory** (short-term + long-term) is recalled and injected as context
4. How the **synthesiser** merges tool output with memory to produce a final answer
5. How **LangFuse** traces every step for full observability

## Prerequisites
- `.env` configured with API keys (OpenRouter, Supabase, Qdrant, Tavily, LangFuse)
- CRM seeded: `make seed-crm-xl`
- KB ingested: `make ingest-qdrant`
- Procedures seeded: `python scripts/seed_procedures.py`

In [None]:
#  Setup ─
import sys, os
sys.path.insert(0, "../src")

from dotenv import load_dotenv
load_dotenv()

# Configure loguru (clean notebook-friendly output)
from infrastructure.log import setup_logging
from loguru import logger

setup_logging("INFO", for_notebook=True)

logger.success("Environment loaded")

## Step 1 · Build the Agent

`build_agent()` is a factory function that wires together:
- **LLM** (chat model via OpenRouter)
- **Embedder** (OpenAI embeddings)
- **Memory stores** (Short-term + Long-term + Episodic + Procedural)
- **Tools** (CRM, RAG, Web Search)
- **Router** (LLM-based intent classifier)
- **Synthesiser** (LLM response generator)
- **LangFuse** (observability - traces, spans, costs)

In [None]:
from agents import build_agent, AgentResponse

agent = build_agent(enable_crm=True, enable_rag=True, enable_web=True)

logger.success("Agent built successfully")
logger.info(f"   LLM model   : {agent._model_name()}")
logger.info(f"   CRM tool    : {'✓' if agent.crm_tool else '✗'}")
logger.info(f"   RAG tool    : {'✓' if agent.rag_tool else '✗'}")
logger.info(f"   Web search  : {'✓' if agent.web_tool else '✗'}")

[1mℹ️[0m [1mCAG cache HIT (sim=1.000): 'What is the Braden Scale for pressure injury risk?' → matched 'What is the Braden Scale for pressure injury risk?'[0m
[1mℹ️[0m [1mCAG cache HIT (sim=1.000): 'How does incident reporting work?' → matched 'How does incident reporting work?'[0m
[1mℹ️[0m [1mCAG cache HIT (sim=1.000): 'What cardiology services does Nawaloka offer?' → matched 'What cardiology services does Nawaloka offer?'[0m


## Step 2 · Helper — Pretty-Print Agent Response

We'll use this throughout to visualise every field in `AgentResponse`.

In [None]:
import re
import pandas as pd
from datetime import datetime
from sqlalchemy.orm import sessionmaker
from infrastructure.db import get_sql_engine
from infrastructure.db.crm_models import Patient, Booking, Doctor, Location, Specialty


#  Intelligent Phone Extraction ─
# Users type phone numbers in many formats.  This helper handles
# them all and normalises to the CRM's `external_user_id` format
# (e.g. "94781030736" - digits only, with country prefix).

def extract_phone(text: str) -> str:
    """
    Extract and normalise a Sri Lankan phone number from free-form text.

    Handles all common formats:
      "+94 78 10 30 736"  →  94781030736
      "0781030736"        →  94781030736
      "94781030736"       →  94781030736
      "+94-781-030-736"   →  94781030736
      "078 103 0736"      →  94781030736
      "(078) 103-0736"    →  94781030736

    Returns:
        Normalised number (digits only, '94' country prefix) matching
        the CRM `patients.external_user_id` column.

    Raises:
        ValueError: if no phone-like sequence is found in *text*.
    """
    # Match a phone-like run: optional '+' then digits mixed with spaces / dashes / dots / parens
    match = re.search(r"\+?[\d][\d\s\-\.\(\)]{7,18}[\d]", text)
    if not match:
        raise ValueError(" No phone number found in the message!")

    # Strip everything except digits
    raw = re.sub(r"\D", "", match.group())

    # Normalise to Sri Lankan international format (94XXXXXXXXX)
    if raw.startswith("0") and len(raw) == 10:
        # Local format: 0781030736 → 94781030736
        raw = "94" + raw[1:]
    elif raw.startswith("94") and len(raw) == 11:
        # Already international - keep as-is
        pass
    elif len(raw) == 9 and not raw.startswith("94"):
        # Bare subscriber number without prefix (781030736)
        raw = "94" + raw

    logger.info(f"   Normalised → {raw}")
    return raw


def show_response(resp: AgentResponse, label: str = "") -> None:
    """Pretty-print an AgentResponse."""
    print("=" * 72)
    if label:
        print(f"  {label}")
    print("=" * 72)
    print(f"\n  Route     : {resp.route}")
    if resp.action:
        print(f" Action    : {resp.action}")
    print(f"⏱  Latency   : {resp.latency_ms} ms")

    # Memory context used
    if resp.memory_context and resp.memory_context.strip():
        print(f"\n Memory Context:")
        for line in resp.memory_context.strip().split("\n")[:8]:
            print(f"   {line}")
        ctx_lines = resp.memory_context.strip().split("\n")
        if len(ctx_lines) > 8:
            print(f"   ... ({len(ctx_lines) - 8} more lines)")
    else:
        print(f"\n Memory Context: (none)")

    # Tool output
    if resp.tool_output and resp.tool_output.strip():
        print(f"\n Tool Output:")
        for line in resp.tool_output.strip().split("\n")[:10]:
            print(f"   {line}")
        out_lines = resp.tool_output.strip().split("\n")
        if len(out_lines) > 10:
            print(f"   ... ({len(out_lines) - 10} more lines)")

    # Final answer
    print(f"\n Answer:")
    print(f"   {resp.answer}")
    print("=" * 72)


#  CRM Table Helpers 
# These query the Supabase CRM tables directly and display
# the results as pandas DataFrames so students can see what's
# actually stored in the database.

def _crm_session():
    """Create a fresh SQLAlchemy session for CRM queries."""
    return sessionmaker(bind=get_sql_engine())()


def show_patient_table(user_id: str) -> dict | None:
    """Look up a patient by external_user_id and display as a table."""
    session = _crm_session()
    try:
        patient = (
            session.query(Patient)
            .filter(Patient.external_user_id == user_id)
            .first()
        )
        if not patient:
            logger.warning(f"No patient found for user_id={user_id}")
            return None

        info = {
            "Patient ID": patient.patient_id,
            "Full Name": patient.full_name,
            "Phone": patient.phone,
            "Email": patient.email or "-",
            "DOB": patient.dob or "-",
            "Gender": {"M": "Male", "F": "Female", "X": "Other"}.get(
                patient.gender or "", patient.gender or "-"
            ),
            "Notes": (patient.notes or "-")[:80],
        }
        df = pd.DataFrame([{"Field": k, "Value": v} for k, v in info.items()])
        print(" Patient Record  (Supabase → patients table)")
        display(df.style.hide(axis="index"))
        return patient.to_dict()
    finally:
        session.close()


def show_bookings_table(user_id: str) -> None:
    """Show upcoming bookings for a user as a table."""
    session = _crm_session()
    try:
        patient = (
            session.query(Patient)
            .filter(Patient.external_user_id == user_id)
            .first()
        )
        if not patient:
            return

        rows_raw = (
            session.query(Booking, Doctor, Location)
            .join(Doctor, Booking.doctor_id == Doctor.doctor_id)
            .join(Location, Booking.location_id == Location.location_id)
            .filter(Booking.patient_id == patient.patient_id)
            .order_by(Booking.start_at.desc())
            .limit(10)
            .all()
        )
        if not rows_raw:
            print(" No bookings found for this patient.")
            return

        rows = []
        for bk, doc, loc in rows_raw:
            spec = (
                session.get(Specialty, doc.specialty_id)
                if doc.specialty_id else None
            )
            rows.append({
                "Date": datetime.fromtimestamp(bk.start_at).strftime("%Y-%m-%d %H:%M"),
                "Doctor": f"Dr. {doc.full_name}",
                "Specialty": spec.name if spec else "-",
                "Location": loc.name,
                "Status": bk.status,
                "Reason": bk.reason or "-",
            })

        df = pd.DataFrame(rows)
        print(f"\n Bookings  ({len(rows)} records from Supabase → bookings table)")
        display(df)
    finally:
        session.close()


logger.success("Helpers ready: show_response, show_patient_table, show_bookings_table")

## Step 3 · User Identification — Extract Identity from Chat

In a real system, the user's identity comes through the conversation itself
(e.g. *"Hi, my mobile is 077…"*), not a hardcoded variable.

Here we simulate that: the user sends a greeting that includes their **phone number**.
The `extract_phone()` helper intelligently handles **any format**:

| User types | Normalised to |
|------------|---------------|
| `+94 78 10 30 736` | `94781030736` |
| `0781030736` | `94781030736` |
| `+94-781-030-736` | `94781030736` |
| `078 103 0736` | `94781030736` |
| `(078) 103-0736` | `94781030736` |

After extraction we look up the CRM and display the patient record + bookings
so you can see exactly what the agent has access to.

In [None]:
#  User sends a greeting with their phone number ─
# In production this arrives via WhatsApp / web-chat; here we
# just type it into the notebook.
# Try changing the format - the extractor handles them all:
#   "+94 78 10 30 736"  |  "0781030736"  |  "+94-781-030-736"

greeting = "Hi there! I'm Anushka, and my mobile number is +94 78 10 30 736."

#  Intelligent phone extraction + normalisation ─
USER_ID = extract_phone(greeting)
logger.success(f" Extracted phone → USER_ID = {USER_ID}")

SESSION_ID = "demo-01"
logger.info(f" Session         : {SESSION_ID}")

#  Verify the user exists in CRM & show DB records ─
print("\n" + "=" * 72)
print(" CRM Lookup - verifying user in Supabase …")
print("=" * 72)

patient_info = show_patient_table(USER_ID)
if patient_info:
    show_bookings_table(USER_ID)
else:
    logger.warning("Patient not in CRM - CRM demos will return 'not found'.")

---

## Demo 1 · `direct` Route — Greeting (No Tool Needed)

Simple conversational queries are answered **directly** by the LLM.
No tool is dispatched. Memory context (if any) is still injected.

In [None]:
# Send the same greeting the user typed (with phone number)
resp = agent.chat(
    user_message=greeting,
    user_id=USER_ID,
    session_id=SESSION_ID,
)

show_response(resp, label="DIRECT route - greeting (identity extracted from this message)")

### What happened?

1. **Memory recall** — checked for recent turns + long-term facts (empty on first message)
2. **Router** — classified as `direct` (no tool needed for a greeting)
3. **Synthesiser** — generated a friendly response directly
4. **Memory store** — saved this turn to short-term memory for future context

---

## Demo 2 · `crm` Route — Patient Lookup

When the user asks about appointments, records, or doctors, the router
dispatches to the **CRM Tool** which queries the PostgreSQL database.

In [None]:
resp = agent.chat(
    user_message="Can you check my upcoming appointments?",
    user_id=USER_ID,
    session_id=SESSION_ID,
)

show_response(resp, label="CRM route - lookup_patient")

#  Behind the scenes - actual DB records 
print("\n What's actually in the database for this user:")
show_bookings_table(USER_ID)

---

## Demo 3 · `crm` Route — Search Doctors

The CRM tool also supports doctor search by specialty.

In [None]:
resp = agent.chat(
    user_message="I need to see a cardiologist. Who's available?",
    user_id=USER_ID,
    session_id=SESSION_ID,
)

show_response(resp, label="CRM route - search_doctors")

#  Behind the scenes - cardiologists in the DB 
session = _crm_session()
try:
    docs = (
        session.query(Doctor, Specialty)
        .join(Specialty, Doctor.specialty_id == Specialty.specialty_id)
        .filter(Specialty.name.ilike("%cardiol%"), Doctor.active == 1)
        .limit(10)
        .all()
    )
    if docs:
        df = pd.DataFrame([{
            "Doctor": f"Dr. {d.full_name}",
            "Specialty": s.name,
            "Phone": d.phone or "-",
            "License": d.license_no or "-",
        } for d, s in docs])
        print("\n Cardiologists in the DB  (Supabase → doctors table):")
        display(df)
finally:
    session.close()

---

## Demo 4 · `rag` Route — Internal Knowledge Base (Hospital Policies)

When the user asks about hospital policies, procedures, or services,
the router dispatches to the **RAG Tool** which:
1. Checks the **CAG cache** for instant answers
2. On cache miss → searches **Qdrant** (parent-child chunks) → generates via LLM
3. Caches the result for next time

In [None]:
resp = agent.chat(
    user_message="What is the hospital's hand hygiene policy?",
    user_id=USER_ID,
    session_id=SESSION_ID,
)

show_response(resp, label="RAG route - internal KB (Qdrant KB + Qdrant CAG cache)")

---

## Demo 5 · `web_search` Route — Real-Time Information

For questions needing live data (hours, directions, news), the router
dispatches to **Tavily Web Search**.

In [None]:
resp = agent.chat(
    user_message="What are the current visiting hours at Nawaloka Hospital?",
    user_id=USER_ID,
    session_id=SESSION_ID,
)

show_response(resp, label="WEB SEARCH route - Tavily")

---

## Demo 6 · Memory Continuity — Allergy Trigger + Recall

This tests the full memory loop:
1. **User mentions critical info** ("I'm allergic to penicillin, always remember")
2. The **distiller** extracts it as a long-term fact
3. A follow-up question tests whether the agent **recalls** it correctly

In [None]:
# The phrase "always remember" triggers distillation
resp = agent.chat(
    user_message="By the way, I'm allergic to penicillin. Please always remember that.",
    user_id=USER_ID,
    session_id=SESSION_ID,
)

show_response(resp, label="Memory trigger - 'always remember'")

In [None]:
# Now test recall - the agent should remember everything from this session
resp = agent.chat(
    user_message="What do you know about me so far?",
    user_id=USER_ID,
    session_id=SESSION_ID,
)

show_response(resp, label="Memory recall - full context test")

---

## Demo 7 · Multi-Turn Conversation (All Routes)

A complete conversation that exercises **all routes** in sequence,
showing how context builds progressively.

In [None]:
# Fresh session for a clean multi-turn demo
MULTI_SESSION = "demo-multiturn"

conversation = [
    "Hello, it's Anushka again. I need some help today.",               # → direct
    "I take atenolol 50mg daily for blood pressure, please remember.",   # → direct + distill
    "Can you find a cardiologist for me?",                               # → crm (search_doctors)
    "What is the emergency evacuation procedure at the hospital?",       # → rag
    "What medications and allergies do you have on file for me?",        # → direct (from memory)
]

print(" Multi-Turn Conversation Flow")
print("=" * 72)

for i, msg in enumerate(conversation, 1):
    print(f"\n Turn {i}: {msg}")
    print("-" * 72)
    
    resp = agent.chat(
        user_message=msg,
        user_id=USER_ID,
        session_id=MULTI_SESSION,
    )
    
    print(f"  Route: {resp.route}" + (f" / {resp.action}" if resp.action else ""))
    print(f"  {resp.latency_ms}ms")
    print(f" {resp.answer[:250]}{'...' if len(resp.answer) > 250 else ''}")

print("\n" + "=" * 72)
logger.success(" Multi-turn conversation complete!")
print(" Check LangFuse dashboard for the full trace waterfall.")

---

## Demo 8 · Observability — LangFuse Traces

Every `.chat()` call creates a **LangFuse trace** with nested spans:

```
agent_chat (trace)
  ├─ memory_recall (span)
  │    └─ memory_recall_inner (span)
  ├─ router (generation)        ← LLM call: tokens, cost, model
  ├─ tool_dispatch (span)
  │    └─ crm_dispatch | rag_search | web_search (span)
  ├─ synthesiser (generation)   ← LLM call: tokens, cost, model
  ├─ memory_store (span)
  └─ memory_distill (span)
       └─ distill_facts (generation) ← if triggered
```

Open your **LangFuse dashboard** → Traces to see:
- Per-step latency waterfall
- Token usage and cost per LLM call
- Input/output of every step
- Memory context that was injected

In [None]:
from infrastructure.observability import flush

# Flush any pending LangFuse events
flush()

langfuse_host = os.getenv("LANGFUSE_HOST", "https://cloud.langfuse.com")
print(f" LangFuse dashboard: {langfuse_host}")
print("   Navigate to Traces → filter by tag 'agent' to see all traces from this notebook.")
print("   Each trace shows the full pipeline: recall → route → dispatch → synthesise → store")

---

## Summary

| Component | Role | Key File |
|-----------|------|----------|
| **Router** | LLM classifies intent → `crm \| rag \| web_search \| direct` | `agents/router.py` |
| **CRM Tool** | Patient lookup, doctor search, booking CRUD | `agents/tools/crm_tool.py` |
| **RAG Tool** | Qdrant KB search + Qdrant CAG cache → LLM answer | `agents/tools/rag_tool.py` |
| **Web Search** | Tavily real-time search | `agents/tools/web_search_tool.py` |
| **Synthesiser** | Merges memory + tool output → natural response | `agents/orchestrator.py` |
| **Memory Recall** | ST turns + LT facts with token budget | `memory/memory_ops.py` |
| **Memory Distill** | Extracts LT facts from conversation | `memory/memory_ops.py` |
| **LangFuse** | Traces every step (cost, latency, I/O) | `infrastructure/observability.py` |

### Next Notebooks
- **02**: Deep dive into memory capture and distillation (4 memory types)
- **03**: Memory store and recall — how context makes the agent smarter