<a href="https://colab.research.google.com/github/dipanjanS/mastering-intelligent-agents-langgraph-workshop-dhs2025/blob/main/Module-4-Building-Advanced-Agentic-AI-Systems/M4LC3_Build_a_Multi_Agent_System_for_Utilization_Review.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build a Multi-Agent System for Utilization Review with LangGraph

![](https://i.imgur.com/44rj36e.png)

### What is a utilization review?

A utilization review is a process in which a patient's care plan undergoes evaluation, typically for inpatient services on a case-by-case basis.

The **review determines the medical necessity of procedures and might make recommendations for alternative care or treatment**. Hospitals usually employ a utilization review (UR) healthcare professional who communicates with the insurance company to evaluate the criteria needed to approve surgeries or treatments.


### Multi-Agent System for Clinical Decision Support

The diagram illustrates a **multi-agent architecture** designed for healthcare decision support.  
It consists of several specialized agents, datasets, and tools orchestrated under a supervisor.

![](https://i.imgur.com/EzayV5W.png)

#### 1. Supervisor Agent
- Central coordinator of the workflow.  
- Uses a **System Prompt** and an **LLM** to decide which specialized agent should handle a task.  
- Collects results and routes them to produce the **Final Decision Output**.

#### 2. Specialized Agents
- **Clinical Intake Agent (📄)**  
  - Gathers patient information.  
  - Interacts with the **Patient Records** dataset through the **Fetch Patient Record** tool.  

- **Guideline Checker Agent (📊)**  
  - Validates care recommendations against medical standards.  
  - Uses the **Match Guideline** and **Check Guideline Validity** tools with the **Medical Guidelines** dataset.  

- **Care Recommender Agent (💡)**  
  - Suggests appropriate treatments or interventions.  
  - Accesses the **Care Recommendations** dataset using the **Fetch Recommendation** tool.  

Each agent runs on a **System Prompt** + **LLM** combination for reasoning.

#### 3. Tools Layer
- Interfaces that allow agents to access external knowledge.  
- Includes:
  - 📄 **Fetch Patient Record**
  - 📊 **Match Guideline**
  - ✅ **Check Guideline Validity**
  - 💡 **Fetch Recommendation**

#### 4. Datasets Layer
- **Patient Records**  
- **Medical Guidelines**  
- **Care Recommendations**  

These serve as the structured data sources powering the agents’ reasoning.

#### 5. Final Output
- The **Supervisor Agent** aggregates results from the intake, guideline validation, and recommendation agents.  
- Produces a **Final Decision Output** for clinical use.

---

**Summary:**  
This system demonstrates how **multi-agent collaboration** can enhance healthcare workflows. Each agent has a well-defined role—intake, validation, recommendation, while the supervisor ensures proper task distribution and final decision-making.







## Install OpenAI, LangGraph and LangChain dependencies

In [None]:
!pip install langchain==0.3.27 langchain-community==0.3.27 langchain-openai==0.3.30 langgraph==0.6.5 --quiet

## Configure API Keys & Environment

Set your OpenAI API key for `ChatOpenAI` and set it in the environment

In [None]:
import os
import getpass

# OpenAI API Key (for chat & embeddings)
if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key (https://platform.openai.com/account/api-keys):\n")

## Sample Data (Guidelines, Care Plans, Patient Records)

This notebook uses **in-memory** Python lists to keep the demo self-contained.  
You can later swap these out for a database or API without changing the agent’s logic.

### What’s included
- **`medical_guidelines`** — evidence-style rules the agent can match against.
  - Fields: `procedure`, `diagnosis`, `required_symptoms` (list), `notes` (free text).
  - Example:
    ```python
    {
      "procedure": "CT Abdomen",
      "diagnosis": "Suspected Appendicitis",
      "required_symptoms": ["abdominal pain", "nausea", "RLQ tenderness"],
      "notes": "CT imaging justified if appendicitis is unclear."
    }
    ```
- **`care_recommendations`** — next-step suggestions keyed by diagnosis.
  - Fields: `diagnosis`, `next_step`.
  - Example:
    ```python
    {
      "diagnosis": "Community-Acquired Pneumonia",
      "next_step": "Start empirical antibiotics; reserve CT for poor responders."
    }
    ```
- **`patient_records`** — patient data including what **procedure** has been recommended to them based on their **diagnosis**.
  - Fields: `patient_id`, `age`, `sex`, `symptoms` (list), `diagnosis`, `procedure`, `notes`.
  - Example:
    ```python
    {
      "patient_id": "P101",
      "age": 38,
      "sex": "Male",
      "symptoms": ["abdominal pain", "nausea"],
      "diagnosis": "Possible early appendicitis",
      "procedure": "CT Abdomen",
      "notes": "Mild abdominal pain and nausea but no localized tenderness or rebound noted."
    }
    ```

### How the agent uses these
- **Guideline matching**: `procedure` + `diagnosis` → pick the closest entry in `medical_guidelines`.
- **Validity check**: compare `patient_records[*].symptoms` vs. `required_symptoms` and read `notes`.
- **Care plan**: map `diagnosis` → `care_recommendations[*].next_step`.

In [None]:
medical_guidelines = [
    {"procedure": "MRI Brain", "diagnosis": "Migraine", "required_symptoms": ["headache", "nausea"],
     "notes": "MRI not recommended unless neurological deficits or red flags present."},
    {"procedure": "CT Chest", "diagnosis": "Suspected Pulmonary Embolism", "required_symptoms": ["chest pain", "shortness of breath", "tachycardia"],
     "notes": "CTPA appropriate for high probability PE cases with positive D-dimer."},
    {"procedure": "MRI Lumbar Spine", "diagnosis": "Chronic Low Back Pain", "required_symptoms": ["back pain > 6 weeks", "neurological deficit"],
     "notes": "MRI only if pain persists despite conservative therapy and neuro signs are present."},
    {"procedure": "CT Chest", "diagnosis": "Community-Acquired Pneumonia", "required_symptoms": ["fever", "cough"],
     "notes": "CT Chest reserved for inconclusive X-rays or immunocompromised patients."},
    {"procedure": "CT Abdomen", "diagnosis": "Suspected Appendicitis", "required_symptoms": ["abdominal pain", "nausea", "RLQ tenderness"],
     "notes": "CT imaging justified if appendicitis is unclear."}
]

care_recommendations = [
    {"diagnosis": "Migraine", "next_step": "Start migraine treatment; imaging not necessary unless red flags appear."},
    {"diagnosis": "Suspected Pulmonary Embolism", "next_step": "Begin anticoagulation and confirm with CTPA."},
    {"diagnosis": "Chronic Low Back Pain", "next_step": "Refer to physiotherapy; MRI only if neuro symptoms persist."},
    {"diagnosis": "Community-Acquired Pneumonia", "next_step": "Start empirical antibiotics; reserve CT for poor responders."},
    {"diagnosis": "Suspected Appendicitis", "next_step": "Do CT to confirm and refer for surgery if positive."}
]

patient_records = [
    {
        "patient_id": "P101",
        "age": 38,
        "sex": "Male",
        "symptoms": ["abdominal pain", "nausea"],
        "diagnosis": "Possible early appendicitis", "procedure": "CT Abdomen",
        "notes": "Mild abdominal pain and nausea but no localized tenderness or rebound noted."
    },
    {
        "patient_id": "P102",
        "age": 65,
        "sex": "Female",
        "symptoms": ["chest pain", "shortness of breath", "tachycardia"],
        "diagnosis": "Clinical suspicion of PE", "procedure": "CT Chest",
        "notes": "Wells score high probability; D-dimer positive."
    },
    {
        "patient_id": "P103",
        "age": 30,
        "sex": "Female",
        "symptoms": ["recurrent headache"],
        "diagnosis": "Classic migraine presentation", "procedure": "MRI Brain",
        "notes": "No neuro signs or red flags. Typical migraine pattern."
    }
]

### Tools for the Utilization Review Agent

These are **tools** (decorated with `@tool`) that the agent can call during a review.  
They encapsulate domain logic and return **small, structured dicts** the agent can reason over.

#### Summary of tools

| Tool | Purpose | Inputs | Output keys |
|---|---|---|---|
| `fetch_patient_record` | Retrieve and summarize a patient chart from in-memory data | `patient_id: str` | `patient_summary` _(str)_, or `error` |
| `match_guideline` | Pick the closest clinical guideline for a (procedure, diagnosis) pair using the LLM | `procedure: str`, `diagnosis: str` | `matched_guideline` _(str)_ |
| `check_guideline_validity` | Validate whether patient symptoms/notes meet the guideline’s criteria | `symptoms: list[str]`, `required_symptoms: list[str]`, `notes: str` | `validity_result` _(str)_ |
| `recommend_care_plan` | Suggest next steps for the given diagnosis | `diagnosis: str` | `recommendation` _(str)_ |

> All LLM-backed tools use `ChatOpenAI` (temperature = 0, streaming enabled in code) and return **concise textual justifications** under a single key.

#### Typical call order used by the agent
1. `fetch_patient_record(patient_id)` → summarize context  
2. `match_guideline(procedure, diagnosis)` → find best-fit rule  
3. `check_guideline_validity(symptoms, required_symptoms, notes)` → approve vs. needs review  
4. `recommend_care_plan(diagnosis)` → action steps / alternatives

#### Example outputs (shape)
```json
// fetch_patient_record
{ "patient_summary": "Patient ID: P102\nAge: 65, Sex: Female\nReported Symptoms: chest pain, shortness of breath, tachycardia\nPreliminary Diagnosis: Clinical suspicion of PE\nRequested Procedure: CT Chest\nClinical Notes: Wells score high probability; D-dimer positive." }

// match_guideline
{ "matched_guideline": "CTPA is appropriate for high-probability PE with positive D-dimer. Required symptoms: chest pain, shortness of breath, tachycardia. Caveats: ensure renal function adequate for contrast." }

// check_guideline_validity
{ "validity_result": "Criteria met: symptoms align and notes indicate high probability (Wells) with positive D-dimer. Imaging is medically necessary." }

// recommend_care_plan
{ "recommendation": "Begin anticoagulation and confirm with CTPA; monitor hemodynamics; consider risk stratification." }


In [None]:
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

@tool
def fetch_patient_record(patient_id: str) -> dict:
    """
    Fetches and summarizes a patient record based on the given patient ID.

    Returns a human-readable summary including age, sex, symptoms, diagnosis, procedure, and clinical notes.
    Also includes the raw patient record in case other tools or agents need structured access.

    Args:
        patient_id (str): The unique identifier for the patient.

    Returns:
        dict: {
            "summary": str,  # Natural language summary of the patient record
        }
    """
    for record in patient_records:
        if record["patient_id"] == patient_id:
            summary = f"""
                        Patient Record:
                        - Patient ID: {record['patient_id']}
                        - Age: {record['age']}
                        - Sex: {record['sex']}
                        - Reported Symptoms: {', '.join(record['symptoms'])}
                        - Preliminary Diagnosis: {record['diagnosis']}
                        - Requested Procedure: {record['procedure']}
                        - Clinical Notes: {record['notes']}
                      """
            return {
                "patient_summary": summary
            }
    return {"error": "Patient record not found."}


@tool
def match_guideline(procedure: str, diagnosis: str) -> dict:
    """
    Match a given procedure and diagnosis to the most relevant clinical guideline.

    Args:
        procedure (str): The medical procedure being requested.
        diagnosis (str): The diagnosis related to the procedure.

    Returns:
        dict: A summary of the best matching guideline if found, or a message indicating no match.
    """
    context = "\n".join([
        f"{i+1}. Procedure: {g['procedure']}, Diagnosis: {g['diagnosis']}, Required Symptoms: {g['required_symptoms']}, Notes: {g['notes']}"
        for i, g in enumerate(medical_guidelines)])

    prompt = f"""You are a clinical reviewer assessing whether a requested medical procedure aligns with existing evidence-based guidelines.

Instructions:
- Analyze the patient's procedure and diagnosis.
- Compare against the list of provided clinical guidelines.
- Select the guideline that best fits the case by reasoning on the common matches considering procedure and diagnosis.
- If none match, respond: "No appropriate guideline found for this case."
- If a match is found, summarize the matching guideline clearly including any required symptoms or caveats.

Patient Case:
- Procedure: {procedure}
- Diagnosis: {diagnosis}

Available Guidelines:
{context}
"""
    result = llm.invoke(prompt).content
    return {"matched_guideline": result}


@tool
def check_guideline_validity(symptoms: list, required_symptoms: list, notes: str) -> dict:
    """
    Determine whether the patient's symptoms and notes satisfy the guideline criteria for medical necessity.

    Args:
        symptoms (list): List of symptoms recorded in the patient’s record.
        required_symptoms (list): List of symptoms required by the matched guideline.
        notes (str): Free-text clinical notes associated with the patient case.

    Returns:
        dict: A string with justification explaining whether the procedure is valid or not.
    """
    prompt = f"""You are validating a medical procedure request based on documented symptoms and clinical context.

Instructions:
- Assess whether the patient's symptoms and notes fulfill the required guideline criteria.
- Consider nuances or indirect references (e.g. "long flight" implies immobility).
- Provide a reasoned judgment if the procedure is medically necessary.
- If it does not qualify, explain exactly which criteria are unmet.

Input:
- Patient Symptoms: {symptoms}
- Required Symptoms from Guideline: {required_symptoms}
- Clinical Notes: {notes}
"""
    result = llm.invoke(prompt).content
    return {"validity_result": result}


@tool
def recommend_care_plan(diagnosis: str) -> dict:
    """
    Recommend a follow-up care plan based on a given diagnosis.

    Args:
        diagnosis (str): The diagnosis to evaluate for next steps.

    Returns:
        dict: A recommendation string describing the suggested care plan or a fallback message if no match is found.
    """
    options = "\n".join([
        f"{i+1}. Diagnosis: {c['diagnosis']}, Recommendation: {c['next_step']}"
        for i, c in enumerate(care_recommendations)])

    prompt = f"""You are a clinical support assistant suggesting appropriate next steps for a given medical diagnosis.

Instructions:
- Analyze the given diagnosis.
- Choose the closest match from the list of known recommendations.
- Explain why the match is appropriate.
- If no suitable recommendation is found, return: "No care recommendation found for this diagnosis."

Diagnosis Provided:
{diagnosis}

Available Recommendations:
{options}
"""
    result = llm.invoke(prompt).content
    return {"recommendation": result}

## Multi-Agent System with Supervisor

The diagram illustrates a **multi-agent architecture** designed for healthcare decision support.  
It consists of several specialized agents, datasets, and tools orchestrated under a supervisor.

![](https://i.imgur.com/EzayV5W.png)

#### 1. Supervisor Agent
- Central coordinator of the workflow.  
- Uses a **System Prompt** and an **LLM** to decide which specialized agent should handle a task.  
- Collects results and routes them to produce the **Final Decision Output**.

#### 2. Specialized Agents
- **Clinical Intake Agent (📄)**  
  - Gathers patient information.  
  - Interacts with the **Patient Records** dataset through the **Fetch Patient Record** tool.  

- **Guideline Checker Agent (📊)**  
  - Validates care recommendations against medical standards.  
  - Uses the **Match Guideline** and **Check Guideline Validity** tools with the **Medical Guidelines** dataset.  

- **Care Recommender Agent (💡)**  
  - Suggests appropriate treatments or interventions.  
  - Accesses the **Care Recommendations** dataset using the **Fetch Recommendation** tool.  

Each agent runs on a **System Prompt** + **LLM** combination for reasoning.

### Implement Sub-Agents (Worker Agents)

In [None]:
from langgraph.prebuilt import create_react_agent

clinical_intake_agent = create_react_agent(
    llm,
    tools=[fetch_patient_record],
    prompt="""You are acting as a clinical intake specialist responsible for reviewing and summarizing a patient's medical case.

Your responsibilities:
1. Carefully read the symptoms, diagnosis, procedure, and notes with the right tool.
2. Generate a medically accurate clinical summary (key points).
3. Identify any additional risk factors or inferred clues from notes (e.g., “long flight” → immobility).
4. Derive the clinical rationale for why the procedure may have been ordered.

Your final message must clearly include:
- A clinical summary
- Key clinical findings (explicit or inferred)

Present your output as if it’s being passed to a medical reviewer for decision-making.
"""
)

In [None]:
guideline_checker_agent = create_react_agent(
    llm,
    tools=[match_guideline,
           check_guideline_validity],
    prompt="""You are a utilization review specialist evaluating whether a requested procedure is medically justified.

Your responsibilities:
1. Identify the most relevant clinical guideline based on the procedure and diagnosis.
2. If a matching guideline is found, extract the required symptoms.
3. Then use an appropriate tool to check the guideline validity for the patient using:
   - the patient’s documented symptoms
   - the required symptoms from the guideline
   - clinical notes (from the intake summary)

Guidance:
- Use clinical reasoning to validate justification.
- If symptoms and context meet the criteria, justify the procedure clearly.
- If not, explain why the guideline requirements are not satisfied.
- If no matching guideline is found, state that clearly.

Your message should be clear, objective, and appropriate for escalation or approval review.
"""
)

In [None]:
care_recommender_agent = create_react_agent(
    llm,
    tools=[recommend_care_plan],
    prompt="""
You are a clinical care assistant responsible for recommending follow-up actions based on a confirmed diagnosis.

Instructions:
1. Consider the outputs from the intake and guideline checker agents.
2. If the guideline checker has determined that the procedure is NOT justified, you must:
   - Suggest alternate steps (e.g., reassess symptoms, collect more data), OR
   - Justify why a care plan involving that procedure may still be warranted due to risk factors.
3. If the guideline checker approved the procedure, proceed with care planning as usual.

Important:
- Always clarify whether your recommendation is based on an approved procedure or despite a failed guideline check.
- Avoid assuming procedures are approved if guideline validation failed.

Use precise clinical reasoning.
"""
)


### Create Supervisor Agent Node Function

In [None]:
from typing import Literal
from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages
from langgraph.types import Command
from langgraph.graph import StateGraph, START, END
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

# Define shared agents state
class State(TypedDict):
    messages: Annotated[list, add_messages]

members = ["clinical_intake_agent",
           "guideline_checker_agent",
           "care_recommender_agent"]

SUPERVISOR_PROMPT = f"""You are a supervisor agent tasked with managing a healthcare utilization review process
between the following agents:
{members}.

Each agent performs a specific part of the review process:
- clinical_intake_agent summarizes the patient's case and rationale
- guideline_checker_agent evaluates medical necessity against clinical guidelines
- care_recommender_agent suggests appropriate follow-up care

Read the current messages. Decide who should act next.
If the full workflow is complete, respond with FINISH.
"""


class RouterPath(TypedDict):
    next: Literal["clinical_intake_agent",
                  "guideline_checker_agent",
                  "care_recommender_agent",
                  "FINISH"]

def supervisor_node(state: State) -> Command[Literal["clinical_intake_agent",
                                                     "guideline_checker_agent",
                                                     "care_recommender_agent",
                                                     "__end__"]]:
    messages = [SystemMessage(content=SUPERVISOR_PROMPT)] + state["messages"]
    response = llm.with_structured_output(RouterPath).invoke(messages)
    goto = response["next"]
    if goto == "FINISH":
        goto = END

        FINAL_RESPONSE_PROMPT = """Analyze all the results from the execution so far before making the final decision

        Your final response should ONLY include the following bullets in the exact format specified:

        - Final Decision: [APPROVED/NEEDS REVIEW]
        - Decision Reasoning: [What criteria matched or did not match]
        - Care recommendation or alternative steps: [care plan steps to take or alternative steps if it needs review]

        Do NOT add any other extra content in the final response
        """
        messages = [SystemMessage(content=FINAL_RESPONSE_PROMPT)] + state["messages"]
        response = llm.invoke(messages)

        return Command(goto=goto, update={"messages": [AIMessage(content=response.content,
                                                                name="supervisor_agent")],
                                          "next": goto})

    return Command(goto=goto, update={"next": goto})

### Create Sub-Agents Node Functions

In [None]:
def clinical_intake_node(state: State) -> Command[Literal["supervisor_agent"]]:
    result = clinical_intake_agent.invoke(state)
    return Command(
        update={"messages": [AIMessage(content=result["messages"][-1].content,
                                          name="clinical_intake_agent")]},
        goto="supervisor_agent"
    )

def guideline_checker_node(state: State) -> Command[Literal["supervisor_agent"]]:
    result = guideline_checker_agent.invoke(state)
    return Command(
        update={"messages": [AIMessage(content=result["messages"][-1].content,
                                          name="guideline_checker_agent")]},
        goto="supervisor_agent"
    )

def care_recommender_node(state: State) -> Command[Literal["supervisor_agent"]]:
    result = care_recommender_agent.invoke(state)
    return Command(
        update={"messages": [AIMessage(content=result["messages"][-1].content,
                                          name="care_recommender_agent")]},
        goto="supervisor_agent"
    )

### Build the Multi-Agent Graph

In [None]:
graph_builder = StateGraph(State)

graph_builder.add_node("supervisor_agent", supervisor_node)
graph_builder.add_node("clinical_intake_agent", clinical_intake_node)
graph_builder.add_node("guideline_checker_agent", guideline_checker_node)
graph_builder.add_node("care_recommender_agent", care_recommender_node)

graph_builder.add_edge(START, "supervisor_agent")

multi_agent = graph_builder.compile()

In [None]:
from IPython.display import Image, display

display(Image(multi_agent.get_graph(xray=True).draw_mermaid_png()))

### Get Agent Response Formatting Utilities

In [None]:
!gdown 1dSyjcjlFoZpYEqv4P9Oi0-kU2gIoolMB

In [None]:
from agent_utils import format_message
from IPython.display import Markdown, display

def call_agent_system(agent, prompt, verbose=False):

    for event in agent.stream(
        input={"messages": [{"role": "user", "content": prompt}]},
        config={"recursion_limit": 25},
        stream_mode='values' #returns full agent state with all messages including updates
    ):
        if verbose:
            format_message(event["messages"][-1])

    print('\n\nFinal Response:\n')
    display(Markdown(event["messages"][-1].content))
    return event["messages"]

### Stream Agent Exeuction

In [None]:
prompt = "Review patient P101 for procedure justification."
response = call_agent_system(multi_agent, prompt, verbose=True)

In [None]:
prompt = "Review patient P102 for procedure justification."
response = call_agent_system(multi_agent, prompt, verbose=True)

In [None]:
prompt = "Review patient P103 for procedure justification."
response = call_agent_system(multi_agent, prompt, verbose=False)